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Abstract 

In  this  Phase  I  STTR  effort,  Frontier  Technology  Inc.  (FTI),  and  University  of  Florida 
(UF)  are  teamed  to  develop  and  analyze  designs  for  automatically  generated  adaptive  statistical 
pattern  recognition  (GASP)  systems  to  detect  targets  against  time- varying  natural  and  manmade 
backgrounds.  This  research  has  been  performed  in  the  context  of  FTI’s  DoD-supported  “TNE” 
(“Tabular  Nearest-neighbor  Encoder’’)  paradigm  for  classify-before-detect  (CBD)  pattern 
recognition  and  ATR.  UF’s  powerful  data  fusion  paradigm.  Morphological  Neural  Nets  (MNNs) 
has  also  been  analyzed  for  this  application.  MNN’s  and  TNE  have  been  proven  highly  successful 
in  a  wide  variety  of  recognition  problems.  Following  preliminary  design,  FTI  and  UF  analyzed 
GASP  system  performance  to  demonstrate  increased  probability  of  detection  (Pd)  with  respect  to 
MNNs  or  more  traditional  Bayesian  systems  alone.  Design  for  real-time  or  near-real  time 
implementations  of  GASP  included  analysis  of  state-of-the-art  embedded  processing 
technologies  such  as  field  programmable  gate  arrays  (FPGAs)  and  single- instruction  multiple- 
data  (SIMD)  array  processors. 

FTI’s  TNE  paradigm  is  built  around  a  very  low  complexity,  vector  search/comparison 
algorithm  based  completely  on  Boolean  operations  among  binary-component,  data-related 
vectors  stored  in  a  large  table.  FTI  has  successfully  applied  the  process  to  several  previous 
applications  with  considerable  success.  Here,  it  forms  the  architectural  basis  for  our  approach  to 
the  derivation  of  time-evolving  inferences  through  the  fusion  of  multiple  sensor/  classifier 
responses  and  the  comparison  of  these  to  a  vectorized  knowledge  base  or  training  set. 
Application  of  the  TNE  process  on  any  (and  all)  signature  vectors  results  in  clusters  of  non¬ 
isolated  ‘Agreement  events”  with  regard  to  the  components  of  every  vector  in  the  training  set,  at 
the  cost  of  only  a  relatively  small  number  of  binary  “and”  and  “or”  processor  operations  within 
the  large  table.  We  generate  a  binary  (Boolean)  element  matrix  ‘Agreement  Map  (AM)”  for 
each  test  vector  (e.g.,  image  patch),  and  then  consider  all  these  related  AMs  in  the  inference 
process.  We  can  even  significantly  suppress  random  noise  “agreements”  by  selecting  only  those 
agreement  events  that  occur  along  with  another  contiguous  (e.g.  adjacent  pixels)  and/or 
contemporaneous  (e.g.,  a  single  pixel  in  subsequent  samples  or  frames)  event  of  the  same  type. 

'  This  is  also  readily  doable  via  a  relatively  small  number  of  Boolean  operations.  Finally,  we 
consider  associations  among  training  set  exemplars.  As  a  result  of  variation  in  object  orientation, 
size,  articulation,  configuration,  etc.;  a  target  is  best  represented  in  the  training  set  by  a  (possibly 
large)  collection  of  exemplars.  We  can  gain  a  measure  of  “inference  strength”  from  a  cluster  of 
agreement  events  within  one  of  these  target-related  exemplar  clusters  by  applying  the  Boolean 
“or”  operation  on  the  respective  AMs.  Thus,  our  TNE  paradigm  facilitates  the  generation  of 
target-related  inferences  based  on  the  fusion  of  (perhaps)  many  diverse  observations  as  compared 
to  collections  of  candidate  exemplars.  The  final  results  of  the  process  are  clusters  of  agreement 
events  indexed  by  training  set  exemplar  group  number,  vector  component  (e.g.,  pixel),  and 
collection  time  (or  spectrum,  or  however  the  data  vectors  themselves  are  indexed).  As  a  part  of 
this  Phase  I  contract,  we  have  implemented  and  partially  optimized  the  above  process. 

We  also  considered  statistical  clustering  and  MNN  approaches  to  further  interpret  the 
composite  “training  set  agreement  event  space”  (i.e.,  a  “stack”  or  related  AMs)  in  order  to 
generate  inferences  with  respect  to  the  viewing  objects.  A  comment:  we  believe  that  our 
approach,  a  strongly  “classify-before-detect”  one,  is  architecturally  similar  to  the  human  visual 
recognition  system. 
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1.0  Introduction 

Frontier  Technology,  Inc.  and  University  of  Florida  (UF)  are  pleased  to  submit  this  Final 
Report  pursuant  to  Army  STTR  02-T004,  Analysis  and  Characterization  of  Pattern  Classifiers. 

Military  and  domestic  targets  in  digital  imagery  can  be  detected  by  a  variety  of 
techniques,  including  hyperspectral  imaging,  change  detection,  and  a  combination  of  broadband 
signature  and  spatial  pattern  recognition  algorithms.  Hyperspectral  imaging  acquires  dense 
signature  information  across  numerous  spectral  bands,  which  can  be  processed  with  techniques 
such  as  component  selection  to  maximize  target-background  separation.  By  locating  areas  where 
target  camouflage  statistics  (e.g.,  hue,  patterning,  and  changes  or  motion)  do  not  match 
background  statistics,  candidate  targets  can  be  identified.  It  is  well  known  that  further 
decomposition  of  spatial  and  spectral  attributes  of  candidate  targets  using  adaptive  filtering  or 
classification  techniques  can  yield  estimates  of  target  presence  at  a  given  location. 

Image-based  automated  target  recognition  systems  are  typically  limited  by  natural  effects 
such  as  statistical,  spectral,  and  geometric  similarity  of  targets  or  target- specific  features  viewed 
against  a  variety  of  naturally-occurring  backgrounds.  In  dynamic  ATR  imagery,  the  movement 
of  targets  among  cover  objects  (e.g.,  overhanging  foliage,  rocks,  etc.)  can  obscure  targets 
sufficiently  to  confound  tracking  algorithms.  A  special  type  of  image  processing  operation  or 
ATR  filter,  called  an  anomaly  detector,  is  able  to  highlight  regions  of  an  image  or  video 
sequence  that’s  statistical,  spectral,  or  geometric  properties  (including  spatial  frequency)  do  not 
match  those  of  its  proximal  background.  ATR  systems,  which  usually  comprise  a  suite  of  ATR 
filters  and  one  or  more  anomaly  detectors,  can  be  used  to  detect  candidate  target  regions  by 
mathematically  combining  (fusing)  the  filter  outputs,  which  can  then  be  processed  by  one  or 
more  pattern  classifiers  to  yield  an  estimation  or  prediction  of  target  location  or  identity.  This 
ATR  strategy  is  called  detect-before-classify  (DBC). 

Unfortunately,  DBC  ATR  systems  have  several  significant  deficits.  Firstly,  accurate 
target  classification  is  predicated  on  the  target  having  been  successfully  detected  and  accurately 
represented  by  a  candidate  target  blob.  Secondly,  the  ATR  filters  are  assumed  to  be  selected  via 
a  priori  knowledge  of  salient  target  characteristics.  Thirdly,  one  assumes  that  the  classifiers  will 
generate  outputs  that  are  not  mutually  contradictory  in  the  context  of  ground  truth.  For  example, 
if  classifier  Ci  claims  a  tank  is  detected,  but  C2  or  C3  respectively  report  that  a  jeep  or  bus  is 
detected,  then  the  classifier  output  will  be  a  less  reliable  estimate  of  actual  target  identity. 
Fourthly,  if  the  ATR  filter  input  (e.g.,  multispectral  imagery)  has  statistical,  spectral,  or  other 
attributes  that  vary  spatiotemporally  in  the  secular  sense  (e.g.,  beyond  mere  intra-  or  inter- frame 
variations),  then  the  classifier  output  reliability  will  drift  as  a  function  of  time.  This  typically 
renders  the  ATR  system  output  nonstationaiy  if  the  system  design  is  based  on  nonadaptive  ATR 
filter  and  classifier  algorithms. 

Another  limitation  of  detect-before-classify  ATR  systems  is  sensitivity  of  the  detection  stage 
to  noise,  spatial  distortion,  and  spectral  shifts  in  the  sensor  response  function.  These  are  but  a 
few  deficits  of  realistic  sensing  systems  that  can  derive  (for  example)  from  poor  hardware 
design,  detector  or  amplifier  malfunction  in  the  presence  of  environmental  hazards;  sensor 
platform  roll,  pitch,  or  yaw;  and  optical  component  response  perturbations  given  nechanical, 
thermal,  or  hydration  stress.  Numerous  other  problems  are  possible,  and  have  been  discussed 
elsewhere  [Sch96].  In  summary,  if  the  detection  stage  is  sensitive  to  noise,  other  systematic 
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error,  or  mission-specific  perturbations,  then  the  information  contained  in  the  candidate  target 
blobs  is  likely  to  be  a  less  accurate  descriptor  of  target  content.  As  a  result,  probability  of 
detection  (Pd)  can  decrease  and  rate  of  false  alarms  (Rfa)  can  increase,  at  significant  liability  to 
missions  depending  on  such  systems. 

In  this  Phase-I  STTR,  we  address  two  solutions  to  the  aforementioned  problems.  Firstly,  we 
have  developed  a  an  approach  to  making  classifier  output  more  reliable  when  (a)  individual 
classifier  design  is  based  on  concepts  of  Bayesian  statistics  (where  significant  a  priori 
information  is  required);  and  (b)  the  characteristic  response  of  multiple  classifiers  employed  in 
an  ATR  system  is  known.  This  has  traditionally  been  achieved  by  applying  evidential  reasoning 
to  multiple  classifier  outputs,  as  illustrated  notionally  in  Figure  1.1.  We  studied  three 
competitive  technologies  for  classifier  output  disambiguation:  Dempster- Schaefer  theory,  fuzzy 
logic,  and  an  innovative  paradigm  called  Morphological  Neural  Nets  (MNNs).  The  best- 
performance  technology  would  be  selected  for  system  design  and  prototype  implementation. 

Secondly,  per  the  Phase-II  goal  of  generating  automated  statistical  pattern  recognition 
systems  (GASP)  we  note  that  there  exists  a  much  more  effective  technique  for  ATR,  called 
classify-bef ore -detect  (CBD)  which  efficiently  combines  detection  and  classification  in  a 
concurrent  operator  (similar  to  the  human  vision  system).  CBD  systems  tend  to  be  more  noise 
tolerant,  and  can  adapt  to  nonstationary  input  using  a  priori  or  a  posteriori  knowledge  including 
a  history  of  filter  performance  under  various  input  conditions.  It  is  important  to  note  that  FTI 
and  UF  have  developed  both  DBC  and  CBD  systems  for  a  wide  variety  of  ATR  applications  for 
numerous  DoD- funded  projects,  and  thus  have  significant  research  and  development  expertise  in 
this  area.  It  is  also  important  to  observe  that,  throughout  this  report,  we  refer  to  Phase-II 
developments  and  results  in  order  to  tie  our  Phase-I  results  into  the  broader  GASP  system 
concept. 


of  Target  Position 
and  Identity 

Figure  1.1:  Schematic  diagram  of  multi-sensor  processing  and  classification  architecture,  in 

support  of  target  detection 

In  contrast  to  model-based,  noise- limited  approaches  developed  by  previous  DBC 
research  efforts  (whether  or  not  statistical  pattern  recognition  was  employed),  we  have  developed 
a  broad-based  system  design  for  GASP  that  would  initially  configure  an  exemplar  pattern 
database  using  a  morphological  NN.  The  resultant  codebook  of  statistical  target  tenplates  would 
be  based  on  significant  prior  knowledge  of  target  attributes,  both  before  and  after  the  ATR 
filtering  process.  Given  this  pattern  database,  FITs  TNE  pattern  recognition  algorithm  can  be 
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used  to  classify  input  data  (e.g.,  multispectral  imagery)  immediately  following  the  sensor  output, 
thus  implementing  classify-before  detect. 

Our  research  team  is  highly  qualified  to  address  all  aspects  of  this  research  and 
development  project.  The  Principal  Investigator,  Mr.  Gary  Key,  has  over  30  years  experience  in 
data/signal  exploitation  and  sensor/processor  characterization.  He  has  led  numerous  research 
efforts  in  these  areas.  Mr.  is  the  inventor/discoverer  of  the  TNE  Paradigm  described  in  this  Final 
Report.  The  Principal  UF  Lead,  Dr.  Mark  Schmalz,  directed  system  design  and  evaluation  at 
UF,  and  was  responsible  for  all  aspects  of  the  research  effort.  Dr.  Schmalz  has  over  15  years 
experience  in  the  design  and  implementation  of  large-scale  software  systems  for  physical  and 
optical  modeling,  as  well  as  target  recognition  and  image  enhancement.  Dr.  Schmalz’  research 
in  target  recognition  and  mine  detection  has  recently  been  funded  by  the  Air  Force  (AFOSR)  arri 
Navy  (ONR),  and  emphasizes  detection  of  natural  or  manmade  targets  (e.g.,  landmines  or 
vehicles)  obscured  by  overlying  cover,  refractive  or  scattering  media  (e.g.,  submerged  targets  or 
sea  mines),  or  atmospheric  obscurants.  The  UF  Co-Lead,  Dr.  Gerhard  X  Ritter,  assisted  in 
theory  development  and  algorithm  design.  Dr.  Ritter  is  Professor  of  Computer  and  Information 
Science  and  Engineering,  Professor  of  Mathematics,  and  Director  of  the  Center  for  Computer 
Vision  and  Visualization  at  University  of  Florida.  Dr.  Ritter  has  over  15  years  experience  in 
DoD- funded  development  of  theory,  algorithms,  and  software  for  computer  vision  applications, 
including  human/vehicle  detection,  tracking,  classification,  and  recognition. 

The  GASP  project  has  yielded  significant  technological  advances  that  can  be  applied  to  areas 
such  as  military  and  domestic  object  detection  applications,  for  example,  battlefield  detection  of 
concealed  defense  assets,  and  (using  technology  developed  in  this  and  other  research  efforts) 
monitoring  of  battle  areas  for  groups  or  structured  arrays  of  concealed  vehicles  or  hazards. 
Adaptations  of  the  proposed  algorithms  and  software  could  be  employed  in  domestic 
applications  such  as  law  enforcement  (e.g.,  contraband  detection),  as  well  as  environmental  or 
wildlife  management  studies  where  passive  surveillance  or  monitoring  of  animal  or  plant 
subjects  is  required. 

1.3.  Project  Scope  and  History 

This  research  project  involved  efforts  in:  (a)  analysis  of  multi-classifier  pattern  recognition 
systems  for  ATR,  (b)  adapting  TNE  and  MNN  algorithms  for  detecting  targets  embedded  in 
naturally-occurring  backgrounds;  (c)  comparative  analysis  of  Bayesian  rule-based  classifiers, 
MNNs,  and  TNE  alone  for  combining  the  results  of  multiple  classifiers  to  support  ATR  with 
increased  Pd  and  reduced  Rfa;  (d)  performance  analysis  of  classifiers  in  c),  above,  that  could  be 
implemented  in  Phase-I;  (e)  design  or  evaluation  of  advanced  classifier  processing  architecture 
concepts  using  MNNs  to  process  the  TNE  “Agreement  Map”;  and  (f)  development  of 
demonstration  capabilities  and  a  Phase-II  proposal  in  support  of  follow-on  research. 

1.4.  Schedule  of  Work  (SOW),  with  Conformance  Data 

The  Schedule  of  Work  (SOW)  for  the  effort  is  listed  below,  with  conformance  or  exceptions 
as  indicated. 

Task  1.  Survey  Evidential  Reasoning  techniques  and  data  for  GASP 

Conformance :  This  task  was  completed  as  proposed,  on  schedule,  as  described  in  the  following 
subtasks. 

ST1.1.  Specify  requirements  for  target  classifier  input/output. 
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Conformance :  Requirements  for  each  post-sensor  classifier  input  were  specified  as  a  vector  of 
Boolean  or  grayscale  values  that  represented  sensor  output,  for  example, 
spectral  coefficients  pertaining  to  target  or  background  regions.  Requirements 
for  the  post-sensor  classifier  output  emphasized  a  classification  score  within  the 
real- valued  interval  [0,1],  or  a  Boolean  score  (e.g.,  target  or  no-target).  These 
scores  would  then  be  input  to  the  classifier  refinement  algorithm,  which  would 
produce  a  list  of  probable  target  identities,  or  an  estimate  of  the  probability  of 
target  presence  given  a  hypothetical  target  identity. 

ST1.2.  Survey  literature  of  evidential  reasoning  for  newest  developments. 

Conformance1.  The  available  literature  at  the  time  of  contract  award  was  surveyed,  and  we  did 
not  find  any  significant  new  research  developments  since  the  date  the  Phase- 1 
proposal  was  submitted. 

ST1.3  Select  best-performance  technologies  on  the  basis  of  accuracy,  adaptability,  and 
efficiency 

Conformance :  We  surveyed  textbook  algorithms  for  Bayesian  inference  and  found  these  to  be 
representative  of  rule-based  classification  algorithms.  Morphological  Neural 
Networks  are  superior  to  other  neural  network  algorithms  for  the  type  of 
classification  envisioned  for  a  GASP  system.  FTI’s  TNE  algorithm  has  been 
demonstrated  to  exhibit  superior  performance  for  template  matching  tasks, 
versus  other  types  of  template  matching  algorithms  studied  in  previous  research. 

Task  2.  Design  and  analysis  of  algorithms,  software,  and  test  imagery  to  support  Classifier 
Output  Fusion  and  Refinement  for  GASP 

Conformance :  This  task  was  completed  without  Subtask  2.3  being  performed  for  TNE 
(although  error  analysis  was  provided  for  Bayesian  and  MNN  based  classifiers). 

ST2.1.  Extend  classifier  performance  analysis  to  include  metrics  for  moving  and  stationary 
targets  in  video  and  still  imagery. 

Conformance ;  This  subtask  was  completed  by  extending  die  Fd  and  Ffa  pixel  fraction  metrics 
that  UF  developed  in  previous  research  to  portray  multiframe  targets.  These 
metrics  are  discussed  in  the  Advanced  Technology  Summary  (Section  7). 

ST2.2.  Specify  enhancements  to  existing  FTI  and  UF  TNE  and  MNN  algorithm  descriptions 
and  software  for  pattern  classifier  refinement  discussed  in  this  proposal.  Incorporate 
UF’s  Morphological  Neural  Networks  into  the  TNE  Paradigm  to  (a)  select  the  TNE 
codebook  per  mission-specific  requirements  and  a  history  of  classifier  performance, 
and  (b)  select  pattern  clusters  for  TNE  agreement  map  processing  to  maximize  Pd  and 
minimize  Rfa. 

Conformance-.  This  UF  subtask  was  redirected  at  the  request  of  Frontier  Technology’s  POC 
Mr.  Key  to  emphasize  MNN-based  processing  of  the  TNE  agreement  map,  with 
research  results  summarized  in  Section  6. 

ST2.3.  Prototype  the  design  of  the  classifier  refinement  system,  and  conduct  noise, 
sensitivity,  and  complexity  analysis  to  support  use  of  GASP  in  the  presence  of  sensor 
noise,  systematic  error,  and  nonergodic  input. 

Conformance :  This  subtask  was  completed  for  Bayesian  classifiers  (Section  4)  and  MNNs 
(Section  5).  The  MNN-specific  theory  allows  one  to  construct  kernel  vectors 
that  are  robust  in  the  presence  of  erosive  or  dilative  noise.  Since  MNNs  are 
robust  classifiers  in  the  presence  of  other  types  of  noise,  this  extends  the  utility 
and  accuracy  of  MNNs  to  provide  not  only  classification,  but  classifier 
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refinement  in  the  presence  of  noise  or  systematic  effects  such  as  thresholding 
error  in  the  TNE  agreement  map. 

Task  3.  End-to-end  system  design  and  Classifier  Refinement  Testing  [E.3.2] 

Conformance:  This  UF  task  was  partially  completed,  but  within  the  constraints  of  the  Phase- 1 
proposed  effort,  which  did  not  include  consideration  of  the  effect  of  MNNs  on 
the  TNE  template  database.  With  the  agreement  of  FTI’s  POC  Mr.  Key,  we 
also  allocated  a  portion  of  the  effort  for  this  task  to  the  theoretical  development 
of  MNN-enhanced  TNE  agreement  map  processing. 

ST3.1.  Design  GASP  system,  design  test  procedures,  and  simulate  performance  using  models 
and  algorithms  developed  in  Tasks  1  and  2. 

Conformance:  Performance  and  complexity  of  MNN-enhanced  TNE  agreement  map 
processing  were  analyzed  theoretically,  which  was  simplified  by  the  fact  that 
MNNs  converge  in  one  pass  of  the  network,  rather  than  oscillating  or 
reverberating  as  do  classical  NNs.  Additionally,  we  re-examined  the 
complexity  of  TNE. 

ST3.2.  Analyze  sensitivity  of  system  to  dynamic  targets  and  target-background  motion 
Include  effects  of  nonergodic  input  on  classifier  refinement  algorithm  learning  rate,  as 
well  as  performance  measures  such  as  Pd  and  Rfa. 

Conformance:  This  subtask  was  completed  for  TNE,  primarily  at  FTI,  with  technical  assistance 
provided  by  UF.  UF  is  continuing  this  work  in  the  context  of  another  DoD- 
funded  project  on  which  UF  has  a  subcontract  with  FTI  as  prime  contractor. 
Further  discussion  of  these  issues  can  be  found  in  the  Advanced  Technology 
Summary  (Appendix  B). 

Task  4.  Management,  Reporting,  and  Marketing  Plan 

Manage  overall  effort,  developing  technical  reports,  final  report  and  internal  and 
external  review  meetings. 

Conformance:  FTI  (with  the  assistance  of  UF)  has  submitted  monthly  technical  reports,  and 
has  written  this  final  report.  We  are  also  collaborating  with  UF  in  writing  the 
Phase- II  proposal,  at  the  request  of  the  sponsor. 


2.0  GASP  System  Concept 

In  this  section,  we  review  the  concept  development  that  was  designed  to  support  our  Phase- 1 
research  for  Army  STTR  02-T004,  Analysis  and  Characterization  of  Pattern  Classifiers.  Section 
2.1  contains  a  discussion  of  project  objectives  and  system  functionality,  Section  2.2  overviews 
the  technique  of  decomposition  that  supported  GASP  related  research  in  multi-paradigm  pattern 
recognition  approaches,  and  Section  2.3  summarizes  implementational  issues,  with  concentration 
on  challenges  that  are  germane  to  the  proposed  Phase-II  research  and  development  effort. 

2.1.  Project  Objectives  and  Achievements 

Consider  a  multi- classifier  system  as  shown,  for  example,  in  Figure  1.1.  Here,  a  sensor 
suite  comprised  of  N  sensors  (e.g.,  radar,  electro-optic,  inertial  or  GPS)  forms  a  projection  of  the 
three-dimensional  scene  within  the  footprint  of  the  sensors.  This  projection  can  contain 
intensity,  spatial,  range  or  spectral  information,  whose  content  and  format  is  idiosyncratic  to  each 
sensor  or  sensor  type,  as  well  as  to  each  target  or  background  object.  In  order  to  process  each 
sensor’s  output  in  a  way  that  is  accurate  in  and  relevant  to  application- specific  objectives,  each 


10 


STTR  Army02-T004  Topic  Title:  Analysis  and  Characterization  of  Pattern  Classifiers; 

GASP  •  Generator  for  Adaptive  Statistical  Pattern  Recognition  Systems  January  31 ,  2003 

of  the  N  sensor  outputs  is  typically  input  to  a  postprocessing  computation  that  is  often  followed 
by  a  classifier.  Alternatively,  several  postprocessed  outputs  can  be  combined  prior  to 
classification.  The  result  is  that  M  <  N  classifier  outputs  are  available,  each  of  which  can 
represent  different  estimates  of  target  presence,  for  example,  target  identity,  probability  of 
detection  or  false  alarm  rate.  In  an  ideal  classification  scenario,  classifier  output  would  also  be 
associated  with  different  data  quality  measures  and  degrees  of  confidence  or  caution. 

The  combination  of  these  disparate  classifier  outputs  is  fraught  with  difficulty,  primarily  due 
to  different  parameters  (e.g.,  a  spectral  detection  process  versus  a  time-of-flight  based  detection) 
and  data  quality  (e.g.,  high  computational  error  corrupting  one  classifier  output,  while  another 
classifier  is  sensitive  to  image  detector  noise  propagated  through  one  of  the  aforementioned 
postprocessing  computations).  Various  techniques  such  as  linear  or  nonlinear  combination, 
Bayesian  or  unsupervised  classification,  and  supervised  learning  techniques  such  as  neural 
networks  have  been  proposed  for  refining  the  output  of  individual  classifiers  or  combining  such 
outputs  to  produce  a  more  accurate  classification. 

The  Topic  Description  for  this  STTR  effort  specified  classifier  refinement,  particularly 
comparison  with  Bayesian  approaches,  as  the  preferred  emphasis.  Accordingly,  we  have 
considered  Bayesian  rule-based  networks  (Section  3),  Frontier  Technology’s  TNE  pattern 
recognition  paradigm  (Section  4),  and  UF’s  Morphological  Neural  Networks  (MNNs)  pattern 
recognition  paradigm  (Section  5)  for  both  classifier  and  post- classifier  processing.  Furthermore, 
we  have  investigated  and  formulated  approaches  for  processing  the  TNE  agreement  map  using 
MNNs  (Section  6).  The  latter  techniques  represent  important  new  developments  in  pattern 
recognition,  which  promise  greater  accuracy  and  information  storage  capacity,  as  well  as 
improved  refinement  of  classified  sensor  output. 

2.2.  System  Decomposition 

Consider  again  the  multi- classifier  system,  in  Figure  1.1.  Here  N  sensors  produce  N  output 
datastreams  that  are  processed  by  ATR  filters  to  extract  salient  features.  The  features  are  then 
classified,  to  yield  preliminary  estimates  of  target  location  or  identity.  In  many  realistic  cases, 
the  individual  classifiers  can  produce  conflicting  results,  due  to  input  noise,  sensor  error  or  noise, 
partial  information,  or  lack  of  coverage  of  the  training  set.  Thus,  a  classifier  refinement  process 
is  applied  to  the  individual  classifier  outputs  to  produce  a  refined  estimate  of  target  location  and 
an  estimate  of  target  identity,  insofar  as  theory,  algorithms,  and  training  data  permit.  The  salient 
parts  of  this  system  are  thus 

•  Sensor  and  Sampling  Process(es),  which  can  include  source  of  noise  and  error,  for 
example,  due  to  image  compression  and  decompression,  undersampling,  or  quantization; 

•  Feature  Extraction  Processes)  that  might  or  might  not  extract  features  accurately,  or  in  a 
manner  that  provides  optimally  useful  information  to  the  follow-on  classifiers); 

•  Feature  Classification,  which  can  be  erroneous  due  to  incomplete  coverage,  lack  of 
contextual  information,  sensitivity  to  error  in  the  feature  extraction  process,  etc.; 

•  Classifier  Output  Refinement,  which  is  expected  to  compensate  for  the  aforementioned 
noise  and  error  effects,  to  provide  a  more  accurate  estimate  of  target  location  and  identity. 

In  Phase-I  research;  we  neglected  the  sensor  and  sampling  effects,  for  two  reasons.  First,  it  was 
expected  that  the  pattern  recognition  paradigm(s)  developed  under  the  scope  of  this  study  would 
be  independent  of  sensor,  sampling,  and  feature  extraction  effects.  Second,  time  and  resources 
did  not  permit  end-to-end  sensor  modeling,  which  was  not  specified  in  the  SOW.  As  a  result,  we 
concentrated  on  pattern  recognition  only,  in  the  context  of  classifier  refinement. 
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2.3.  Implementational  Issues 

When  considering  the  system  diagram  shown  in  Figure  1.1  in  relationship  to  the  three  pattern 
recognition  paradigms  studied  in  this  project,  it  is  reasonable  to  determine  how  each  paradigm 
can  be  decomposed  in  relationship  to  Figure  1.1  and  the  system  decomposition  discussed  in 
Section  2.2.  We  have  found  that  Bayesian  rule-based  classification  covers  the  classifier  and 
classifier  output  refinement  blocks  of  Figure  1.1,  while  FTI’s  TNE  paradigm  covers  all  post¬ 
sensor  processing,  including  sampling  of  the  sensor  data.  In  particular,  TNE  has  as  its  prior 
information  (in  the  Bayesian  sense)  the  target  template  database,  distance  measures  used  to 
compute  the  matching  score  between  a  given  sensor  output  sample  and  a  target  template,  and  the 
thresholding  function  and  parameters  employed  in  transforming  the  match  score  into  a  binary 
number  that  becomes  a  value  in  TNE’s  Agreement  Map. 

Morphological  Neural  Networks  simplify  the  processing  cascade  shown  in  Figure  1.1  by 
directly  operating  on  the  sensor  output,  although  MNNs  can  also  be  applied  to  feature  detector 
output.  Classifier  refinement  is  not  required  when  MNNs  solely  are  employed.  When  MNNs 
are  combined  with  TNE,  then  the  MNN  is  used  to  process  the  Boolean  data  in  the  TNE 
Agreement  Map.  If  the  columns  of  the  AM  are  viewed  as  preliminary  classifier  results  for  a 
given  sample,  then  an  MNN  applied  to  the  AM  can  produce  a  refined,  target- specific  estimate  of 
presence;  location  data  is  provided  by  the  TNE  sample  coordinates. 

Computational  complexity  is  an  issue  for  all  three  paradigms  studied.  Bayesian  rule-based 
pattern  recognition  requires  a  minimum  of  O(log  N)  time  for  runtime.  Build-time  (rule 
assembly)  work  varies  with  the  complexity  of  the  feature  vector  and  training  set  size.  TNE 
requires  considerable  build-time  overhead  to  construct  the  binary  pointer  table  from  which  the 
agreement  map  is  derived.  However,  the  run-time  classification  is  I/O-intensive  primarily,  as 
discussed  in  Section  4.  For  example,  given  an  N-point  sample  and  a  template  database 
comprised  of  M  targets,  0(MN/W)  I/O  operations  are  required  to  build  each  AM  on  computers 
with  maximum  W-bit  integer  arithmetic  (usually  W=32  or  W=64),  while  each  target  vector  can 
be  post-processed  in  the  classifier  refinement  step  in  0(N/W)  time.  Since  TNE  is  a  massively 
parallel  pattern  recognition  paradigm,  these  complexities  can  be  straightforwardly  reduced 
through  the  use  of  multiple  parallel  buses  and  processing  elements,  as  discussed  in  [Key99]. 
MNNs  are  the  most  efficient  of  the  three  paradigms  studied  herein:  build-time  activity  pertains 
directly  to  computation  of  the  MNN  weight  matrix,  which  requires  very  small  training  times. 
Runtime  classification  is  accomplished  in  one  pass  of  the  net:  for  an  N-input  net,  this  amounts  to 
0(N)  computation  on  an  N-processor  machine. 

As  shown  in  previous  research  [Key99,  Rit98],  both  TNE  and  MNNs  are  tolerant  of  input 
noise  and  missing  information.  This  claim  cannot  be  made  globally  for  Bayesian  classifiers, 
where  one  must  know  all  a  priori  probabilities,  and  be  able  to  completely  and  accurately 
calculate  the  a  posteriori  probabilities  for  the  entire  expected  target  set.  Additionally,  MNNs  can 
perform  a  limited  amount  of  interpolation  to  support  classification  with  partial  information, 
while  neither  TNE  nor  Bayesian  rule-based  classifiers  are  capable  of  interpolation. 

As  a  result  of  our  Phase-I  research;  we  envision  the  following  implementational  challenges  in 
a  Phase-II  research  effort: 

•  Quantifying  the  performance  of  TNE,  MNNs,  and  the  TNE/MNN  combination  in  the 
presence  of  input  noise,  error,  and  partial  information; 

•  Quantifying  the  performance  limits  on  MNN-directed  processing  of  the  TNE  agreement 
map,  to  determine  limits  of  utility  on  TNE/MNN  versus  MNN  or  TNE  alone; 
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•  Effectiveness  with  which  MNNs  can  be  used  to  cluster  target  exemplars  in  TNE’s 
template  database,  to  achieve  more  efficient  pattern  recognition  without  degrading  Pd  or 
Rfa  performance;  and 

•  Hardware  design  and  implementation  for  TNE/MNN  applications,  to  achieve  optimal  or 
near-optimal  space/time/error  tradeoffs  for  various  embedded  processing  scenarios. 

We  next  provide  a  theoretical  overview  of  Bayesian  classifiers,  with  emphasis  on  classifier 
error  and  performance  metrics. 

2.4.  References 

[Key99]  Key,  G.,  M.S.  Schmalz.  F.M.-  Caimi,  and  G.X.  Ritter.  “Performance  analysis  of 
tabular  nearest  neighbor  encoding  algorithm  for  joint  compression  and  ATR”,  in 
Proceedings  SPIE  3814:115-126  (1999). 

[Rit98]  Ritter,  G.X.,  P.  Sussner,  and  J.L.  Diaz  de  Leon.  “Morphological  associative 
memories”,  IEEE  Transactions  on  Neural  Networks  9(2):281-293  (1998). 


3.  Bayesian  Rule-Based  Pattern  Recognition  Paradigm 

Bayesian  decision  theory  is  a  statistical  pattern  classification  approach  that  is  based  on  two 
strong  assumptions:  (1)  a  given  decision  or  classification  problem  can  be  posed  in  probabilistic 
terms,  and  (2)  all  of  the  relevant  probabilities  are  known.  Without  initially  considering  either 
epistemological  basis  or  realism  of  these  assumptions,  we  first  present  a  mathematical 
description  of  Bayesian  classifier  theoiy  (Section  3.1).  We  then  discuss  techniques  and  metrics 
by  which  the  performance  (e.g.,  classification  accuracy)  of  pattern  classifiers  in  general,  and 
Bayesian  classifiers  in  particular,  can  be  quantified  (Section  3.2).  Section  3.3  contains  an 
overview  of  Bayesian  classifier  refinement  techniques.  Finally,  in  Section  3.4,  we  note  the 
shortcomings  of  die  Bayesian  paradigm  relative  to  statistical  techniques  such  as  FTTs  TNE 
paradigm  or  neural  networks. 

3.1.  Mathematical  Description  of  Bayesian  Pattern  Recognition 

Let  us  begin  by  considering  a  simple  problem  of  classifying  candidate  targets  in  digital 
monochromatic  imagery.  In  particular,  assume  that  a  given  target  area  in  an  image  (also  called 
mAOI)  can  be  classified  according  to  its  brightness  or  grayscale  value  g,  as  target  (g  =  g)  or 
non-target  (g  =  g).  We  generalize  this  assumption  by  saying  that  there  is  an  a  priori  probability 
Pr(gi)  that  the  current  AOI  is  a  target  and  some  a  priori  probability  Pr(g)  =  1  -  Pr(g)  that  it  is 
not  a  target.  If  the  decision  process  is  automated,  then  the  simplest  approach  is  to  decide  that  a 
given  AOI  is  a  target  if  Pr(gi)  >  Pr(g);  and  conversely  a  non-target  object.  In  general,  the 
probability  of  error  is  the  smaller  of  Pr(g)  and  Pr(g). 

In  practice,  we  view  grayscale  intensity  g  as  a  continuous  random  variable  whose  distribution 
depends  upon  multiple  constraints  (e.g.,  target  size,  shape,  and  surface  texture  or  color;  lighting 
angle(s)  and  color(s);  environmental  variables  such  as  atmospheric  temperature,  humidity,  or 
windspeed  in  the  case  of  infrared  imagery;  camera  lens,  detector,  and  amplifier  spatial  and 
intensity  response,  etc.)  Let  p(g  |  g)  denote  the  state-conditional  probability  density  function  for 
g,  given  that  the  state  of  the  target  recognition  environment  is  g,  where  I  =  1  or  I  =  2  is  possible 
in  our  highly  simplified  example.  Then,  the  difference  in  brightness  between  target  and  non¬ 
target  objects  is  described  by 

Ag  =  p(g|  g\)-p(g  |  &). 
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Now  let  us  make  this  problem  more  realistic  by  assuming  that  we  know  Pr(g)  and  Pr(g>)  as 
well  as  pig  |  gi)  and  pig  |  g2).  Additionally,  suppose  we  can  measure  g  for  a  given  AOI.  Bayes’ 
Rule  shows  that  observing  the  value  of  g  changes  the  a  priori  probability  Pr(g)  to  the  a  posteriori 
probability  Pr(g  |  g),  as  follows: 

ftfc  ,  (3.,) 

where  the  composite  probability  (also  called  mixture  density)  p(x)  is  given  by 

P(x)=  I/>(g|gi)'Pr(gi)  •  (3.2) 

i=l 

If  we  have  a  value  g  for  which  Pr(g  |  g)  >  Pr(g2 1  g),  then  it  is  reasonable  to  decide  that  a  target  is 
present;  conversely,  a  non-target  object  is  recognized. 

Now  let  us  consider  the  probability  of  error  in  this  simple  example.  Whenever  we  measure  a 
particular  grayscale  value  g,  the  probability  of  an  error  £  is  given  by 

Pr(e  |  g)~  1  AOI is  classified  as  target  ^  ^ 

**  [Pr(g2  |g)  if  AOI  is  classified  as  non-target ' 

As  before,  we  can  minimize  the  probability  of  error  by  deciding  target  when  Pr(g  |  g)  >  Pr(g2 1 
g),  and  non-target  conversely.  This  estimation  is  correct  because  the  average  probability  of  error 
is  given  by 

Pr(e)  =  J  ^  Pr(e,x)<hc  =  J  ^  Pr(e  |  x)^(x)  dx .  (3.4) 

If,  for  every  value  of  g,  Pr(e  |  g)  is  minimized,  then  the  preceding  integral  is  minimized. 

There  are  well-known  arguments  that  the  scale  factor  p(x)  is  inherently  unimportant  because 
it  merely  ensures  that  the  a  posteriori  probabilities  sum  to  unity.  Elimination  of  p(x)  yields  the 
following  Bayesian  decision  rule: 

Classify  AOI  as  target  if  pig  |  gi)Pr(gi)  >  pig  |  g>)Pr(g2),  otherwise  classify  as  non-target. 

Let  us  now  consider  these  concepts  in  light  of  the  two-class  Bayesian  decision  problem  inherent 
in  target  versus  non-target  discrimination. 

3.1.1.  Two -Class  Problem.  Let  x  =  (xi,  )§,  ...,  y^)  denote  a  pattern  vector  comprised  of  n 
features,  which  is  to  be  classified  in  class  c\  or  c2.  Denoting  the  a  posteriori  probability  of  c\ 
given  x,  we  have  previously  observed  that,  in  the  simplest  classifier  based  upon  probabilities 
only,  x  is  in  class  c\  if  Prfyi  |  x)  >  Pr(c2  |  x)  or,  conversely,  in  c2.  In  Equation  (3.1),  we  showed 
that  the  a  posteriori  probability  Prfyi  |  x)  can  be  calculated  from  the  a  priori  probability  Prfyi),  for 
i  =  1  or  i  =  2,  using  the  Bayes  rule: 

Prfyi  |  x)  =  Prfyi) pic ;  |  x)  / p(x)  , 

where  p(x)  denotes  the  mixture  density  function  referred  to  as  a  scale  factor  in  the  preceding 
discussion.  Since  p(x)  is  positive  and  common  to  both  sides  of  the  inequality  inherent  in 
Bayesian  decision,  we  can  write  the  decision  rule  as 


Prfyi) p(c\  |  x)  ^  Pr (C2)p(c2  |  x) . 
c2 


This  can  also  be  expressed  in  terms  of  a  likelihood  ratio  L(x),  defined  as 


14 


STTR  Army02-T004  Topic  Title:  Analysis  and  Characterization  of  Pattern  Classifiers; 

GASP  -  Generator  for  Adaptive  Statistical  Pattern  Recognition  Systems  January  31 ,  2003 


rf-\  P(°\  1X)  ^  ^2) 

MXj%(c2|x)c<2Pr(q)’ 


where  Pr(c2)/Pr(ci)  is  called  the  threshold  value  of  the  likelihood  ratio  for  the  decision.  Writing 
this  in  logarithmic  form,  we  have  the  minus  log  likelihood  ratio 

h(x)~—  log  {L  (x))  =  —  log  (p(c\  |  x))  +  log(p(c21  |  x))  £  log  [^7^1  =  log(Pr(ci))  -  log(Pr(c2)) 

c2  “rlc2  )  J 

which  is  also  called  the  discriminant  function.  The  comments  pertaining  to  decision  error,  which 
surround  Equations  (3.3)  and  (3.4),  also  apply  to  this  discussion. 

3.1.2.  Implementational  Issues.  It  is  possible  to  compute  h  given  specific  assumptions  about 
prior  and  posterior  probabilities.  For  example,  when  the  values  p(c{  \  x)  are  normal  with 
expected  vectors  Mi  and  covariance  matrices  V\,  then  the  decision  rule  becomes 
A(x)  =  -log(T(x)) 


=  I(x-M1)TFf1(x-M1)-I(x-M2)TF2-1(x-M2)+ilog  {  logj^^Mj(3-7) 

Equation  (3.7)  illustrates  that  the  decision  boundary  is  expressed  quadratically  in  x.  However, 
when  Vi  =  V2  =  V,  the  boundary  is  a  linear  function  of  x,  expressed  as 


h{x)  =  {M2-Ml)TV~lx+±\piJv~lMl-M2V~lM2)  i  log 


gM) 

p^2)r 


where  F  denotes  the  system  covariance  matrix. 

The  consideration  of  covariance  matrices  leads  to  a  special  case  where  M\  =  0  and 

i  ~  n-l 

1  Pi  •••  Pi 

y.  —  Pi  ^ 

Y\  ~~  .  > 

••  Pi 

^n-1  ^  , 

LPi  Pi  1  J 

which  can  be  seen  where  stationary  random  processes  are  sampled  temporally  to  form  random 
vectors.  Note  that  Fj-1  and  |F|  are  expressed  for  this  form  of  V\  as 


'1  -Pi  0  •••  0 

-Pi  1+pf  "Pi 

vrl= - -=  0  ••.  V  \  0 

1“P2  ■  i  2 

:  •.  -Pi  1+pf  -Pi 


0  and  |  Fj  |  =  (l— p?  'f  \ 


[0  -  0  -Pi  1  J 

This  allows  us  to  express  the  quadratic  form  of  Equation  (3.7)  as 
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1+Pl  1+P2  y  2  _  Pi  P2 

1-P?  1-P2  i=l  1  1-Pl  1-P2 


'  (xf  +x») 


2p^  +(„_1)logfl_PL>|  >  iog 

1-Pl  1  - P2  Ji=l  [l-P2  P 


Pr(q) 

Pr(c2) 


where  the  second  term  portrays  the  boundary  effect  of  finite-length  observation  of  random 
processes,  which  diminishes  with  increasing  n.  Ignoring  the  second  and  fourth  terms,  and  setting 
log(Pr(ci))  /  log(Pr(c2))  =  0  [e.g.,  Pr(ci)  =  Pr(c2)],  one  can  express  the  decision  rule  as 

(ZxiXi+1  )/Xx?  <  T,  where  T  denotes  a  threshold. 

In  other  words,  classification  occurs  by  thresholding  an  estimate  of  the  correlation  coefficient. 
This  is  reasonable,  since  vj  ^  v2  is  (in  this  case)  the  only  difference  between  c\  and  C2. 

Another  interesting  well-known  case  occurs  when  the  pattern  vectors  are  mutually 
independent  and  exponentially  distributed,  which  implies  that 

n  1  ~Ts(xi) 

P(ci  |x)=n  — ea,J  ,  (3.10) 

j=laij 

where  3j  denotes  the  exponential  distribution  parameter  for  5  and  c,,  with  s  denoting  the  step 
function.  As  a  result,  h(x)  in  Section  3.1.1  can  be  rewritten  as 


,  .  n  1  1  n 

Mx)  =  X - xj+  X  log 

j=l[alj  a2j  J  j=l 


(3.11) 


In  this  case,  the  Bayes  decision  rule  becomes  a  linear  function  of  the  terms  55. 


3.2.  Overview  of  Classifier  Performance  Analysis 

In  order  to  determine  whether  or  not  a  classifier  is  performing  correctly,  we  need  to  take  into 
account  the  concepts  of  cost  and  risk.  For  example,  in  ATR  practice,  one  might  prefer  to  weight 
different  target  or  non-target  objects  according  to  their  importance.  For  example,  the  cost 
associated  with  a  classification  as  tank  is  likely  to  be  of  different  kind  and  degree  than  the  object 
type  schoolbus.  In  the  two-class  problem  described  herein,  the  misclassification  of  a  non-target 
object  (e.g.,  a  schoolbus)  as  a  target  (e.g.,  a  tank)  is  likely  to  be  significantly  more  damaging  than 
the  converse.  Accordinagly,  we  present  the  following  brief  discussion  of  cost  weighting  in 
Bayesian  decision  paradigms. 

If  Cij  denotes  die  cost  of  deciding  that  a  feature  vector  derived  from  an  observation  (e.g.,  an 
AOI  in  an  image)  is  a  member  of  class  c\  (written  as  xe  q)  when  xecj  is  the  correct 

classification,  then  the  conditional  cost  of  deciding  xe  q,  which  we  denote  as  n(x),  is  given  by 

ri  (x)  =  Ca  ■  Pr(ci  |  x)+  Ci2  ■  Pr(c2  |  x)  .  (3.12) 

The  decision  rule  and  its  resulting  conditional  cost  rix)  given  x  are  expressed  as 
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^lM  i  r2(*)  311(1  r(x)  =  mtti(r1(x),r2(x)) . 


(3.13) 


The  total  cost  r  of  this  decision  is  given  by  the  expected  value  of  r(x): 
r  =  £(|Kx)|)  =  Jmin  fa  (x),  r2  (x))  •  p(x)  dx 

=  \Li (CU  Pr(ci )p{ci  |  x)+C12  Pr(c2 )p(c2  \x))dx+ j  (C21  Pr(c1)p(c1 1  x)+C22  Pr(c2)p(c2  |x)) 
where  L\  andZ,2  are  determined  by  the  decision  rule  in  Equation  (3.13). 

The  decision  boundary  that  minimizes  r  in  the  preceding  equation  can  be  calculated  as  follows. 
First,  we  rewrite  the  preceding  equation  as  a  function  of  L\,  by  replacing 

fL  P(c i  I  x)^x 

with 

l~\LxP^\x)dx  ’ 

which  can  be  done  since  L\  and  Li  are  disjoint  but  cover  the  entire  domain.  We  thus  obtain 
r  =(c21  Pr(cl)+c22  Pr(c2  ))  +  [(Cn  -  C21  )Pr(c!  )p(q  |x)+(C12  -  C22  )Pr(c2  )p(c2  |  x)]  dx 

(3.14) 

By  appropriate  choice  of  L\  (where  possible),  r  can  be  minimized.  If,  for  a  given  x,  the  integrand 
of  Equation  (3.14)  is  negative,  then  r  can  be  decreased  by  assigning  x  to  L\.  If  the  integrand  is 
positive,  it  is  possible  to  decrease  r  by  assigning  x  to  Li.  Thus,  the  minimum-cost  decision  rule 
assigns  to  L\  the  values  of  x  for  which  the  integrand  of  Equation  (3.14)  is  negative.  This  can  be 
summarized  in  the  following  inequality: 


(Q 2  -  c22  )Pr(c2  )p(c2  I  *)  <  (C21  -  C„  )Pr(q  )p{Cl  \  x) 

Cl 


(3.15) 


which  can  be  expressed  ratiometrically  as 


.p(c1  I x)  (Q2  ~  ^22  )Pr(c2) 

p(c2|x)cl  (C21  -Cn)Pr(c,)  ' 


(3.16) 


This  is  typically  called  the  Bayes  test  for  minimum  cost.  By  comparing  Equation  (3.16)  with 
Equation  (3.6),  it  can  be  readily  seen  that  this  test  is  a  likelihood  ratio  test  with  a  threshold  that  is 
different  from  the  one  used  in  Equation  (3.6).  In  particular,  the  selection  of  cost  functions  is 
equivalent  to  changing  the  a  priori  probabilities  Pr(ci).  It  is  interesting  to  note  that  Equation 
(3.16)  is  identical  to  Equation  (3.6)  when  Cn  -  Cn  =  Cn  -  Cn,  which  is  called  a  symmetrical 
cost  function.  Here,  the  cost  becomes  the  probability  of  error,  and  the  test  inherent  in  Equation 
(3.16)  minimizes  this  probability  of  error.  In  a  two-class  problem,  when  a  wrong  decision  for 
one  class  is  more  critical  than  for  the  other  class,  an  asymmetrical  cost  function  must  be 
employed. 
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3.3.  Refinement  of  Bayesian  Classifier  Output 

Machine  learning  (ML)  las  become  an  important  part  of  research  in  artificial  intelligence, 
image  and  signal  processing,  and  database  research.  In  statistical  pattern  recognition,  one  applies 
probability  theory  and  decision  theory  to  a  statistical  model  of  the  input  to  obtain  a  classification 
algorithm.  This  is  opposed  to  using  training  data  to  select  among  different  algorithms,  or  to 
employing  common-sense  heuristics  to  design  an  algorithm.  In  classification,  one  assigns  a  class 
identifier  to  a  measurement,  or  equivalently,  identifies  the  probabilistic  source  of  a  measurement. 
Here,  one  needs  only  the  conditional  model  of  the  class  variable  given  the  measurement,  which 
can  be  obtained  from  a  conditional  model  or  learned  directly.  The  former  approach  is 
generative,  since  it  models  the  measurements  in  each  class,  but  requires  more  work,  can  exploit 
more  a  priori  knowledge,  needs  less  data,  is  more  modular,  and  can  handle  missing  or  corrupted 
data.  Applicable  techniques  include  mixture  models  and  hidden  Markov  models.  The  latter 
approach  (direct  learning)  is  discriminative  since  it  focuses  only  on  distinguishing  between 
classes.  This  can  be  more  efficient  in  runtime  mode  (i.e.,  pattern  classification  subsequent  to 
training)  and  requires  fewer  modeling  assumptions.  Typical  techniques  include  logistic 
regression,  generalized  linear  classifiers,  and  nearest-neighbor  classification.  Based  on  our 
research,  we  prefer  direct  learning  due  to  the  previously-  listed  advantages,  as  well  as  its 
flexibility  and  reduced  reliance  on  a  priori  models  (which  are  often  brittle  in  ATR  practice). 

In  Bayesian  model  selection,  one  chooses  the  model  that  assigns  the  highest  probability  to 
the  data  after  all  parameters  have  been  removed  (e.g.,  via  integration).  In  contrast,  maximum 
likelihood  estimates  parameter  values  that  maximize  the  classifier’s  likelihood  function.  This 
ignores  the  prior  distribution  and  thus  is  inconsistent  with  Bayesian  theory,  but  works  well  in 
practice.  Alternatively,  maximum  a  posteriori  (MAP)  heuristics  seek  parameter  values  that 
maximize  a  parameter’s  posterior  density.  This  implies  the  Bayesian  assumption  of  a  prior 
distribution  over  the  parameter.  Although  MAP  estimation  has  generally  high  accuracy,  it  tends 
to  predict  future  data  less  accurately  than  maximum  likelihood  techniques.  A  better  Bayesian 
method  for  parameter  estimation  is  predictive  estimation.  Also,  MAP  is  sensitive  to 
reparameterization,  which  is  central  to  adaptivity  in  the  proposed  GASP  scheme  of  ML. 

An  important  implementational  technique  in  ML  is  the  induction  of  decision  trees,  which 
tend  to  be  more  comprehensible  to  humans  than  other  learning  models  such  as  neural  networks 
[Gro99].  Decision  trees  efficiently  solve  the  ATR  classification  problem  (i.e.,  predicting  a  class 
identity  based  on  target/feature  attributes),  since  classification  requires  only  a  linear  walk  down 
the  tree.  Bagging  and  Boosting  [Die98]  are  efficient  techniques  for  improving  classification 
accuracy  of  a  single  decision  tree,  by  generating  multiple  learning  methods,  then  voting  on  their 
output. 

Grossman  and  Williams  [Gro99]  improved  the  Bagging  algorithm  introduced  by  Breiman 
[Bre94],  based  on  the  ID3  decision  tree  algorithm  with  chi-squared  pre-pruning.  Learning 
methods  are  weighted,  where  weights  are  based  on  the  classification  accuracy  of  each  learner  on 
samples  not  selected  during  the  bootstrapping  process,  which  generates  random  training  sets  for 
each  learner  from  the  training  data.  In  contrast,  Hashem  [Has94,Tum96]  uses  least- squares 
regression  to  discover  weights  that  iraximize  classification  accuracy.  Additionally,  Indurkhya 
and  Weiss  tested  alternate  decision  trees  based  on  pruning  by  statistical  significance  levels,  on  a 
subset  of  the  training  data  to  find  the  best-performance  tree  [Ind98].  They  also  employed  a 
voting  mechanism  based  on  this  tree. 
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The  work  of  the  preceding  authors  was  based  on  the  assumption  that  testing  on  a  verification 
set  estimates  the  future  performance  of  a  decision  tree.  Grossman  and  Williams  generate  a 
sufficient  number  of  multiple  trees  by  choosing  bootstrap  sets  from  the  original  training  data. 
They  use  a  dynamic  weighting  function  for  estimating  the  goodness  of  each  tree  in  the  ensemble, 
but  do  not  discard  trees  in  the  ensemble  that  perform  below  the  goodness  threshold  prior  to 
voting. 

Although  theory  for  a  posteriori  refinement  of  decision  free  based  classification  has  advanced 
considerably  [ScF97,Fri98],  there  appears  to  be  no  easy  method  for  increasing  the  adaptability  of 
such  classifiers  or  follow-on  multi-classifier  output  disambiguation  procedures.  For  example, 
classification  with  Dempster- Schaefer  theory  has  been  based  on  assumptions  of  stationary 
Markov  models  [Cla91].  However,  practical  target  recognition  problems  involve  nonstationary 
input,  to  which  successful,  robust  classifiers  must  adapt.  As  mentioned  in  previous  sections,  a 
key  disadvantage  of  Bayesian  approaches  is  the  requirement  of  significant  a  priori  knowledge,  a 
statistical  model  of  the  input  that  must  incorporate  key  features  that  are  themselves  input  to  the 
pattern  classifier(s),  and  decision  structures  that  must  be  adapted  in  a  supervised  method  -  a 
globally  optimal  construction  method  is  presently  unclear.  Additionally,  die  noise  tolerance  of 
Bayesian  classifiers  has  only  been  recently  reported  for  experimental  ATR  scenarios  [GreOO], 
and  Bayesian  classifiers  do  not  necessarily  adapt  well  to  nonergodic  inputs  -  for  example,  the 
vast  majority  of  reports  in  the  literature  describe  designs  based  on  ergodic  Markov  processes 
[WanOO,Got98  ,Bor96] . 

Traditionally,  the  adaptability  of  pattern  recognizers  is  conceptualized  as  having  two 
modalities,  which  we  have  previously  called  build-time  and  runtime.  During  build  time,  the 
classifier  can  be  adapted  offline  (i.e.,  trained)  on  a  static  dataset.  Input  changes  can  be  tracked 
by  maintaining  a  history  of  input  statistics.  During  runtime,  the  classifiers  and  their  associated 
output  disambiguation  or  postprocessing  routine(s)  are  applied  to  input  data  in  the  usual  manner. 
To  remedy  well-known  problems  of  lack  of  adaptivity  associated  with  traditional  rule-based 
classifiers,  we  have  employed  a  novel  multi-classifier  approach.  In  particular,  we  have  assisted 
FTI  in  employing  a  morphological  NN  to  process  the  TNE  agreement  map  columns,  which 
correspond  to  preliminary  classifier  output,  as  discussed  in  Section  2.  It  has  been  shown  that 
MNNs  perform  superiorly  to  classical  NNs  [Rit97,98].  MNN  technology  is  further  discussed  in 
Section  5. 

In  a  possible  Phase  2  effort,  we  propose  to  assist  FTI  in  further  extending  the  basic  concept 
of  Grossman  and  Williams’  technique  to  include  adaptive  classifier  refinement  techniques  based 
on  UF’s  MNNs  and  FTI’s  TNE  paradigm.  As  in  the  current  study,  the  fusion  and 
disambiguation  of  multiple  classifier  outputs  can  be  modulated  by  classifier  performance  data 
determined  by  standard  ATR  performance  analysis  metrics  (e.g.,  Pd  and  Rfa).  This  closed- loop 
modulation  of  the  classifier  refinement  process  supports  adaptation  to  performance  challenges 
such  as  input  nonergodicities  via  a  backpropagation  algorithm  that  closely  resembles  established 
learning  procedures  that  have  been  successfully  applied  to  classical  NNs.  Observe  that  accurate, 
nontrivial  free-  or  rule-based  classifiers  that  continuously  adapt  to  fluctuations  in  input  statistics 
have  yet  to  be  reported  in  the  literature.  Similar  in  overview  to  the  implementation  of  traditional 
classifiers,  MNN,  and  TNE-based  classifiers  are  illustrated  notionally  in  Figure  3.1.  However, 
the  combination  of  adaptability  and  accuracy  inherent  in  the  joint  MNN/TNE  classifier 
represents  a  significant  development  in  pattern  recognition  technology.  The  concept  of  using 
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MNNs  to  drive  adaptation  of  TNE-based  pattern  recognition  represents  an  important  advance 

over  traditional  open- loop  ATR  systems. 

A  second  use  of  MNNs  would  be  in  supporting  target  cluster  analysis  and  optimization  in  the 
template  database.  In  practice,  targets  tend  to  exhibit  similarity  at  slightly  different  poses,  due  to 
tangent  projection  geometry.  That  is,  a  tank  rotated  out-of-plane  by  five  degrees  will  appear 
visually  similar  to  the  same  tank  rotated  out-of-plane  by  10  degrees.  A  preliminary  estimate  of 
target  similarity  can  be  arrived  at  by  application  of  a  Euclidean  distance  function  to  the  M  target 
exemplars,  each  of  which  has  K  pixels,  thereby  incurring  work  of  0(KM1 2)  multiplications  and 
0(M2)  square  roots.  Alternatively,  an  MNN  could  be  configured  to  examine  the  target 
exemplars,  taking  into  account  factors  such  as  background  mean,  spatial  and  grayscale  variance, 
or  feature  orientation.  Each  of  these  measures  could  be  determined  by  an  MNN,  and  could 
provide  useful  information  about  target  similarity  beyond  the  distance  functions  typically  (but 

not  necessarily)  employed  by  TNE. 
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Figure  3.1.  Experimental  configuration  for  competitive  performance 
analysis  of  classifier  refinement  algorithms. 

3.4.  Utility  of  Bayesian  Classification  for  ATR 

The  preceding  discussion  highlights  several  features  of  Bayesian  decision  paradigms: 

1 .  Coverage  must  be  complete  -  that  is,  all  outcomes  must  be  known.  This  means  that  Bayesian 
classifiers,  as  presented  in  the  classical  Bayesian  theory  of  this  section,  cannot  accurately 
interpolate  or  extrapolate  over  missing  data. 

2.  Probabilities  must  be  computed  accurately,  if  a  given  outcome  is  unknown  or  guessed  at, 
then  the  a  priori  probability  will  be  erroneous,  which  will  influence  the  computation  of  a 
posteriori  probabilities  and,  therefore,  the  associated  likelihood  ratio(s).  This  means  that 
Bayesian  cost  minimization  will,  in  the  affected  instances,  be  erroneous  and  thus, 
nonoptimal.  In  practice,  therefore,  the  presence  of  erroneous  probabilities  mean  that  the  cost 
function  cannot  be  minimized. 


20 


STTR  Army02-T004  Topic  Title:  Analysis  and  Characterization  of  Pattern  Classifiers; 

GASP  -  Generator  for  Adaptive  Statistical  Pattern  Recognition  Systems  January  31 ,  2003 

3.  A  given  classifier  comprised  of  a  fixed  system  of  Bayesian  rules  cannot  readily  be  re¬ 
parameterized  to  include  additional  classes,  rules,  or  instances  of  input  patterns.  If  such 
additional  constraints  are  introduced,  then  the  prior  knowledge  changes,  which  in  each 
nontrivial  case  modifies  the  a  posteriori  probabilities.  As  in  item  2,  above,  this  affects  the 
likelihood  ratios  and  cost  function  minimization. 

The  preceding  features  represent  significant  limitations  of  classical  Bayesian  pattern  classifiers. 

Reasoning  under  uncertainty  and  incomplete  data  are,  however,  supported  by  FTI’s  TNE 
paradigm,  which  is  discussed  in  Section  4.  Further  support  for  such  realistic  constraints  is 
provided  by  neural  network-based  pattern  classifiers,  in  particular,  UF’s  Morphological  Neural 
Networks,  which  is  discussed  in  Section  5.  In  particular,  neural  nets  have  well-known 
capabilities  of  accurate  pattern  classification  given  partial  or  noise-corrupted  inputs,  and  are 
extensible  in  the  sense  that  they  can  learn  new  patterns  and  classification  rules.  Thus,  we  next 
discuss  these  paradigms,  then  overview  their  joint  application  to  ATR  in  Section  6.  A  summary 
of  advantages  and  disadvantages  of  each  approach  is  given  in  Section  7. 
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4.0  TNE  Pattern  Recognition  Paradigm 

Tabular  Nearest  Neighbor  Encoding  (TNE)  is  an  advanced  pattern  recognition  paradigm  that 
employs  mapping  operators  to  construct  an  intermediate  data  array  called  the  agreement  map 
(AM)  from  prior  knowledge  applied  to  input  data.  The  AM  is  then  processed  (for  example,  via 
thresholding)  to  yield  a  Boolean  data  structure  that  can  be  subjected  to  bitwise  or,  and,  as  well  as 
xor  operations,  followed  by  row  or  column  summation  to  yield  classification  scores.  Because 
TNE  is  based  on  an  efficient  projection  technique  that  performs  preliminary  detection  of 
candidate  pattern  features  in  hyperspace,  and  because  this  projection  is  implemented  in  terms  of 
I/O  operations  only,  TNE  is  significantly  more  efficient  than  template  matching  techniques  based 
on  arithmetic  operations.  Section  4.1  presents  basic  theory  of  TNE,  while  Section  4.2  compares 
TNE  with  previous  and  related  research.  Section  4.3  contains  several  applicative  examples  and 
complexity  analysis.  Section  4.4  presents  a  heuristic  overview  of  the  paradigm  applied  to  multi¬ 
source  data  fusion. 

4.1.  Mathematical  Description  of  TNE 

We  begin  with  an  overview  of  TNE  concepts,  then  progress  to  the  TNE  algorithm.  Let  an  N- 
pixel  input  (e.g.,  a  J~N  x-Jn  -pixel  source  image  ae  Z^f)  be  mathematically  subdivided  into  K- 

pixel  sampling  blocks,  of  which  there  are  N/K  such  blocks  or  test  vectors.  Each  vector  be 

has  K  vector  components.  Sampling  blocks  can  be  rectangular  or  noncontiguous,  and  the  vector 
components  can  be  groups  of  pixels,  streaming  audio  or  video  subsamples,  etc. 

Further  assume  that  there  exists  a  database  c  of  M  reference  patterns,  each  having  K  pixels 

and  being  configured  in  the  same  manner  as  b.  Thus,  c(0 s  Z^ ,  1  <i<M  .  For  example,  if  b 

is  comprised  of  square  blocks  of  -J~E  x*J~K  pixels,  then  each  exemplar  b (i)  in  c  has 
'Tk  x-J~K  pixels. 

In  traditional  pattern  recognition  applications,  a  technique  called  exhaustive  template 
matching  compares  each  K-pixel  source  block  b  with  each  K-pixel  exemplar  in  c,  requiring  work 
of  at  least  NMK  pixels  per  source  image,  if  all  sampling  block  phase  shifts  are  accounted  for.  If 
one  applies  such  operations  to  large  images  with  many  reference  patterns  (e.g.,  N  =  106,  M  =  105, 
K  =  102),  then  prohibitive  work  (e.g.,  NMK  =  10* 3  comparison  operations)  is  incurred. 

In  contrast,  TNE  precomputes  a  projection  D  that  stores  all  possible  outcomes  resulting  from 
application  of  the  M-exemplar  pattern  database  to  each  possible  source  block  configuration.  For 
example,  if  the  source  value  set  is  Zn,  then  there  are  nf  possible  configurations  of  each  source 
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block.  Because  TNE  uses  I/O  operations  only  to  implement  p,  in  a  processor  having  fast  I/O, 
this  approach  is  significantly  more  efficient  than  using  comparison  operations.  We  next  present 
theoiy  that  describes  the  TNE  algorithm,  in  particular,  die  projection  D  that  supports  TNE’s 
implementational  efficiency. 

4.1.1.  Assumptions.  Let  an  image  a  e  F*  be  subdivided  by  an  indexing  function  h  to  yield  a 
collection  of  K-pixel  sampling  blocks  A  =  {b(y) :  y  €  Y} ,  where  YcX.  Let  a  codebook  c  be 
formed  from  A,  such  that  c  contains  M  K-pixel  exemplars,  each  of  which  represent  a  cluster  Cu 
where  i  =  1..Q.  Let  a  feature  space  representation F  have  axes  Bi,  B2, ...,  Bj, ...,  BP,  to  which  are 
projected  each  of  the  clusters  Q,  thereby  producing  a  collection  of  intervals  denoted  by 

/  =  {Ijj€  R2  :l<i<Mand  1  <  j  <  P} . 


Boxes  which  project  onto  level  k 


cm 

□  ! 


in  □0=1 

□ 

- Training  Set  Boxes 


Possible  levels,  component  Xn 


Level  K  boxes: 


2,  5,  6,  9, 14, ... 


Level  K  Binary  vector:  01001100100001... 

Figure  4.1:  Derivation  of  binary  pointer  vectors  for  each  quantization  level  along 
_  each  dimension  of  a  highly- dimensional  feature  space.. 


This  projection  process  is  illustrated  notionally  in  Figure  4.1,  in  which  the  implementational  term 
binary  pointer  vector  corresponds  to  a  projection  of  multiple  pattern  clusters. 

4.1.2.  Algorithm.  Let  a  sampling  block  b(y)  be  represented  by  a  point  or  region  p  in  a  P- 
dimensional  feature  space  F.  Let  p  be  projected  to  axes  %  j  =  1..P,  to  yield  a  collection  of 
intervals  denoted  by 

J  =  {Jjj  6  R2  :1  <i  <  M  and  1<  j<  P} . 

Although  Jj  is  one-dimensional  when  p  is  a  point  (the  simplest  case),  we  employ  the  more 
involved  assumption  that  Jy  is  two-dimensional,  for  purposes  of  generality. 

Step  1.  Let  /  and  Jbe  processed  by  an  operation  that  compares  the  extent  of  Jy  with  the 
extent  of  Iy,  such  that  a  PxM-element  bitmap  d  is  formed,  as  follows: 

d/L  i)  =  J1  MJiJ^Plfli.pand  P2{\i)<P2^) 

[0  otherwise  ’ 

where  pk  denotes  projection  to  the  k*h  coordinate.  This  process  is  also  illustrated 
notionally  in  Figure  4.1. 
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Step  2.  As  an  example  of  pattern  recognition,  sum  d  rowwise,  and  subtract  P  as  to  yield 
scores  g(i)  =  P-Edj,  where  d;  denotes  the  ith  row  of  d.  The  resultant  scores 
equal  the  Hunming  distances  between  p  and  each  exemplar  c(i)  represented  by 
cluster  C\.  Associated  AM  generation  and  manipulation  steps  are  shown 
notionally  in  Figure  4.2. 

Step  3.  The  best- match  codebook  exemplars  are  given  by  c(domain(min  (g)));  the  choice 
function  can  be  used  to  select  one  best- match  exemplar  randomly  from  these 
exemplars.  An  example  matching  result  is  shown  schematically  in  Figure  4.3. 
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Figure  4.3:  TNE  pattern  recognition  process  combines  the  source  vector  with  the 
training  set  vector  in  the  context  of  the  agreement  map.  In  particular,  TNE  derives  a 
binary  agreement  vector  wherein  a  1  signifies  that  the  associated  components  agree 
within  a  prespecified  tolerance,  and  a  0  indicates  no  match. 
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4.1.3.  Complexity.  Assuming  that  the  codebook  cluster  projections  are  precomputed,  as 
mentioned  previously,  the  projection  of  p  to  the  axes  of  F  requires  O(P)  arithmetic  and 
transcendental  operations  per  source  block,  for  example,  P  sine  operations  and  2P  additions. 
Comparison  of  I  and  J  requires  2PM  comparisons  per  source  block,  with  P(M+1)  additions 
required  to  produce  g.  Similarly,  M  comparisons  are  required  to  find  the  best- match  exemplar  in 
c.  Hence,  the  work  required  by  TNE  codebook  search  over  a  is  given  by: 

Wine  =  N(M(2P  + 1)  comparison  s  +  P(M  +  3)  additions  +  P  transcende  ntals)  . 

4.1.4.  Observation.  In  previous  research  [Key99],  we  ha\e  presented  TNE  as  a  means  for 
implementing  fast  codebook  search  over  imagery  compressed  by  vector  quantization  (VQ). 
However,  the  TNE  algorithm  does  not  necessarily  compress  an  image,  but  primarily  provides  an 
efficient  means  for  codebook  search.  TNE  can  be  configured  to  compress  an  image  by  indexing 
each  sampling  block  according  to  the  spatial  configuration  of  its  grayscale  values.  That  is,  a 
given  pixel  (x,a(x))  of  a  provides  both  spatial  and  grayscale  information  to  a  map 
D: XxR  — >G,  where  R  denotes  domain^ a)  and  G  is  an  indexed  set  of  pointers  to  Mbit 
Boolean  vectors  stored  in  database  D.  Each  vector  represents  one  of  the  K  pixels  of  a  given 
sampling  block  b.  In  the  resulting  MxK-pixel  array  d,  which  is  called  the  agreement  map,  die  j- 
th  column  represents  a  bit  vector  of  binary  matching  scores  between  (1)  value  b(x)  at  position  x 
of  domain(b)  indexed  by  j,  and  (2)  all  exemplar  values  c(i)(x),  where  i  =1  ..M.  The  exemplar  that 
best  matches  b  is  given  by 

c(choice(domain(min(K  -  Ed;)))) , 
where  dj  denotes  the  i-th  row  of  d. 

4.1.5.  Remark.  It  is  easily  verified  that  precomputation  of  D  is  the  burdensome  step  in  the  TNE 
algorithm,  which  can  be  compared  to  the  overhead  of  codebook  construction.  For  example,  if 
each  sampling  block  has  K  pixels  each  having  m  greylevels,  then  irF  block  configurations  are 
possible.  Comparison  of  these  configurations  with  the  M  codebook  exemplars  yields  a  total  cost 
of  W  =  O(KMnf)  comparison  operations.  Given  small  values  of  K  =  64,  M  =  256,  and  m  = 
256,  it  is  easily  verified  that  W  is  prohibitively  large.  Hence,  it  is  reasonable  to  determine  the 
subset  S  of  the  rrF  block  configurations  that  occurs  in  a  given  training  set.  Given  S,  W  can  be 
reduced  to  0(KM'|S|)  comparison  operations.  For  example,  if  |S|  =  105  and  the  proportionality 
constant  in  the  complexity  estimate  of  W  is  set  to  unity  for  purposes  of  simplicity,  then  W  = 
2563x105  =  1.67  x  109  comparison  operations.  Derivation  of  the  working  set  S  is  a  topic  of 
current  research,  which  we  plan  to  emphasize  in  a  possible  Phase- II  effort. 

4.2.  Previous  Work 

4.2.1.  Observation.  As  noted  previously,  a  problem  with  VQ  codebook  search  has  been 
achieving  log(M)  time  complexity.  Efficient  codebook  search  algorithms  that  have  been 
published  in  the  open  literature  include: 

1)  n-ary  tree  search,  where  each  exemplar  is  rendered  in  progressive  n-fold  resolution 
reduction,  to  facilitate  multiresolution  representation  per  source  block  [Xu98] 

2)  Row  and  column  decimation  schemes,  whereby  a  sampling  block  is  characterized  h 
terms  of  a  signature  derived  from  each  row  or  column,  to  facilitate  comparison  of  fewer 
features  [E1S99]; 

3)  Statistical  search  techniques,  whereby  each  codebook  exemplar  and  source  block  are 
represented  by  parameters  such  as  mean,  standard  deviation,  skewness,  or  kurtosis 
[Pog93,Sch97]; 
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4)  Subpattern  characterization,  in  which  each  codebook  and  exemplar  is  subdivided  into 
partitions,  each  of  which  correspond  to  a  small  pattern,  thus  producing  a  hierarchy  of 
codebooks  [Che99];  and 

5)  Block  transformations  such  as  the  discrete  cosine  transform  employed  in  JPEG,  to  reduce 
each  encoding  block  to  a  vector  of  coefficients  that  can  be  quantized  and  downselected 
[Mey98], 

The  common  feature  of  each  search  method  is  the  reduction  of  an  exemplar  or  source  block  to  an 
approximate  representation,  which  can  be  thought  of  as  a  signature  or  feature  vector. 

4.2.2.  Remark.  Unfortunately,  hierarchical,  row/column,  statistical,  subpattem,  and 
transformation  based  representational  schemes  are  n-to-1  mappings  and  are  thus  lossy 
transforms.  As  a  result,  the  representation  of  a  given  source  block  may  not  be  sufficiently 
precise  to  avoid  confusion  between  exemplars.  Therefore,  one  usually  implements  a  multi-level 
matching  scheme  to  compensate  for  this  deficiency.  An  illustrative  example  follows. 

4.2.3.  Algorithm.  In  multi- level  matching,  given  a  source  block  b,  one  performs  the  following 
steps: 

Step  1.  Reduce  b  to  an  approximate  representation,  such  as  a  block  mean  or  simplified 
zero-crossing  pattern  derived  by  thresholding  b  about  its  mean. 

Step  2.  Compare  the  reduced  block  to  a  similar,  reduced  representation  of  each  codebook 
exemplar.  Alternatively,  the  block  representation  can  be  structured  to  point  to  a 
block  indices  in  a  list  of  Ne  probable  best- match  exemplars. 

Step  3.  If  the  htter  alternative  in  Step  2  is  employed,  then  each  of  the  H  exemplars 
would  be  compared  to  b,  yielding  a  best  match  according  to  a  prespecified 
matching  criterion. 

4.2.4.  Complexity.  Reduction  of  the  source  block  to  a  parametric  representation  typically 
requires  fsK  sampling  operations,  where  the  fraction  f  denotes  a  subsampling  ratio.  For 
example,  if  f  =  0.4,  then  two  out  of  every  five  source  pixels  are  sampled.  Table  4. 1  lists  the 
complexity  of  various  reduction  operations,  including  n-ary  tree  encoding.  If  a  given 
representational  system  uses  bj,  parameters  and  a  reduced  source  block  representation,  then 
comparison  of  Ne  exemplars  requires  at  least  NpNe  operations.  The  cost  of  different  comparison 
methods  is  listed  in  Table  4.2.  Observe  that  threshold  techniques  listed  in  Table  4.2  generally 
compare  a  source  block  value  v  to  determine  if  v  is  in  a  prespecified  interval,  which  can  be 
space-variant.  The  Sum  Threshold  (Product  Threshold)  comparison  sums  (resp.  takes  the 
product  of)  the  Boolean  values  that  represent  the  partial  thresholded  matches. 

Table  4.1.  Work  Wr  incurred  by  a  reduction  operation  applied  to  a  K-pixel  source  block  with 

sampling  factor 


Block  Oneration 

Additions 

Multiplications 

Comparisons 

Roots 

Mean  [i;  work  = 

i 

0 

0 

Std.  Deviation  c;  WCT 

[2fsKl-l 

ftKl+1 

0 

1 

kth  Central  Moment  m k 

[kfsKl-l 

f(k-l)fsKl+l 

0 

1 

n-ary  Tree  Encoding 

vfcKl  K 
i=0  n*  «i+1 

i K 

i=07Ji+1 

0 

0 

Table  4.2.  Work  Wm  incurred  by  a  matching  operation  applied  to  an  Np-parameter  source  block 

representation. 
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Block  Operation 

Additions 

Multiplications 

Comnarisons 

Roots 

Sum  of  Deviations 

Np-l 

0 

0 

0 

Mean  Deviation 

Np-1 

1 

0 

0 

Sum-Squared  Deviation 

Np-l 

Np 

0 

0 

RMS  Deviation 

2NP  - 1 

Np 

0 

1 

Normalized  Autocorrelation 

0 

2NP  +  1 

0 

0 

Normalized  Cross-correlation 

0 

3Np  +  1 

0 

0 

Sum  of  Thresholds 

Np-l 

0 

2NP 

0 

Product  of  Thresholds 

0 

Np 

2NP 

0 

The  mean  work  required  for  block  encoded  compression  via  these  types  of  reductionist 
codebook  search  techniques  can  thus  be  expressed  in  terms  of  blocksize  K,  mean  number  of 
exemplars  Ne  retrieved  during  primary  search,  and  the  work  (Wm)  required  by  a  block 
reduction  (matching)  operation,  as  follows: 

Wtot  =  Wr(f,  K)  +  Ne '  Wm(Np) . 

Note  that  Wm  is  not  weighted  by  £  (although  the  work  estimates  in  Table  4.1  could  be  so 
weighted),  since  computation  over  the  reduced  representation  is  already  efficient. 

4.2.5.  TNE-like  Pattern  Matching  Paradigms.  Ramasubramaniam  and  Paliwal  [Ram99] 
recently  published  a  search  paradigm  similar  to  TNE  that  is  amenable  to  joint  implementation 
with  vector  quantization.  Two  techniques  are  investigated  ( box-search  and  cell-partition 
search ),  which  are  based  on  Voronoi  projections.  The  cell-partition  method  uses  an  intersection- 
count  procedure  for  obtaining  a  set  of  candidate  vectors,  whereas  the  box- search  algorithm  uses  a 
bounding-box  interpretation  of  the  projection  information.  Since  it  does  not  maintain  an 
intersection  count,  box-search  requires  significantly  less  storage.  Both  techniques  produce  an 
optimum  set  of  codevectors.  The  authors  also  investigated  the  effect  of  principal  component 
rotation  and  found  that  it  reduces  the  complexity  of  codebook  search  based  on  Voronoi 
projection,  while  potentially  reducing  representational  error  in  the  reconstructed  (decompressed) 
data. 

4.2.5.I.  Box-search.  Given  codebook  c,  let  the  Voronoi  region  V;  associated  with  a  K- 
dimensional  codebook  exemplar  c;  contain  all  points  in  the  space  RK  that  are  nearer  to  Ci  than  to 
any  other  codevector.  If  a  test  vector  xs  Vi,  then  Ci  is  the  nearest  neighbor  of  x.  The  projection 
Pij  of  V  onto  the  j-th  coordinate  axis  of  RK  is  an  interval  with  upper(R)  and  lower  (L)  boundary 
points 

pij  =x^iPj(*)  and  Py  =^.Pj(x), 

where  pj  denotes  projection  to  the  j-th  coordinate.  The  preceding  bounds  represent  two 
hyperplanes  normal  to  the  jh  coordinate  axis.  Taken  together,  the  K  projections  P,j,  j  =  1..K, 
define  a  set  Bi  of  2K  hyperplanes  that  bound  V  as 

Bi  ={x:P^SPj(*)£PV,j  =  l..K}. 

Since  each  codevector  has  such  a  hypercuboid  approximation  to  V,  we  observe  that  Vj  c  Bi 
implies  that  a  given  test  vector  x  may  have  several  associated  candidate  codevectors  in  the  set 
C(x)  =  {cj :  x  e  Bi}.  Thus,  the  central  task  of  the  box-search  algorithm  is  to  locate  the  nearest 
neighbor  of  x  in  C(x). 

This  search  can  be  rendered  efficient  by  testing  each  of  the  K  coordinates  of  x  against  the 
bounds  of  Vj  that  corresponds  to  Cj  €  C(x).  If  the  j-th  coordinate  of  x  falls  outside  B,(j),  then  Ci  is 
rejected  as  a  candidate  codevector.  The  complexity  of  this  initial  search  step  is  thus  0(K  ■  |C(x)|) 
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comparisons.  The  reduced  set  C’  obtained  by  initial  search  of  C  is  then  subjected  to  distance 
computation,  thereby  incurring  O(K)  arithmetic  operations.  This  is  precisely  the  method 
employed  in  TNE  based  compression,  which  uses  a  Mahalanobis-like  distance  as  the  matching 
metric. 

4.2.5.2.  Cell-partition  search.  In  the  preceding  section,  N  overlapping  intervals  Py  generate  2N 
projection  boundaries  (y,i,  y)2,  ...,  y,n)  having  a  natural  ordering  yj,i  <  yj-,2  <  ...  <  yj,N  that 
partitions  the  jh  coordinate  axis  into  2N-1  contiguous  intervals  I  =  {Ij(l),  1(2),  ...  ,  f(2N-l)}, 
where  Ij(m)  =  (yhm,  y,m+i).  Each  interval  l(m)  is  associated  with  a  set  £J(m)  of  indices  for  V 
whose  projection  intervals  Py  can  overlap  with  F(m). 

Further,  the  2N-1  contiguous  intervals  on  each  of  the  K  coordinate  axes  partitions  RK  into 
(2N-1)k  hyperrectangular  cells.  Given  test  vector  x  =  (xj,  x2,  ...  ,  jfc),  if  f(mj)  contains  then 
SJ(mj)  contains  the  candidate  codevectors  that  could  be  nearest  neighbors  of  x.  The  test  vector  is 
thus  located  by  examining  K  intervals  f(mj),  j  =  1..K,  to  yield  the  hyperrectangular  cell 

H(x)  =  nij(mj). 

j=i 

Thus,  the  final  set  of  candidate  codevectors  is  given  by 

K 

S(x)=nsJ(mj)  . 
j=i 

This  development  suggests  to  techniques  for  finding  the  candidate  codevectors  C’(x).  First,  one 
can  entabulate  the  (2N-1)K  cells  in  a  table,  from  which  C’  is  obtained,  here  the  best- match  cell  is 
identified  using  the  preceding  equation.  Unfortunately,  this  requires  prohibitive  storage  that 
increases  exponentially  as  0(NK).  The  second  technique  precomputes  the  candidate  sets  S*(m) 
and  stores  them  in  K  mapping  tables.  Each  table  corresponds  to  one  coordinate  axis  and  has  2N- 
1  rows,  one  per  interval  Py.  This  requires  0(K(2N-1))  storage.  The  final  set  of  candidate 
codevectors  is  obtained  by  an  intersection  count  procedure,  which  requires  Klog(2N)  +  W 
operations,  where  K+N  <  W  <  (K+1)N.  This  is  similar  to  the  possible  TNE  matching  procedure 
of  summing  columns  of  the  agreement  map  to  yield  a  basis  for  Hamming  distance  computation. 
Such  summation  can  be  applied  to  single  columns  or  to  columns  grouped  via  rowwise  operations 
such  as  Boolean  or,  and,  or  xor. 

4.2.5.3.  Principal  component  rotation.  Ramasubramanian  and  Paliwal  state  that  the  application 
of  principal  component  rotation  to  the  K-dimensional  exemplar  space  referred  to  in  Sections 
4.2.5. 1  and  4.2.5.2  results  in  a  consistently  lower  complexity  for  box-search  than  die  unrotated 
case.  This  occurs  since  the  projections  of  the  Voronoi  regions  obtained  with  the  rotated 
coordinates  significantly  reduce  the  overlap  between  bounding  boxes  B,  thereby  reducing  the 
size  of  set  C’.  In  contrast,  the  storage  and  computational  overheads  of  cell-partition  search  are 
significantly  increased,  since  coordinate  rotation  increases  the  average  index  set  size.  An 
additional  effect  of  rotation  is  increased  signal- to-noise  ratio  (i.e.,  decreased  reconstruction 
error).  Again,  this  is  due  to  reduced  overlap  between  Voronoi  regions.  Also,  the  authors  report 
that  rotation  groups  the  SNR  closer  to  that  obtained  via  full  search  for  all  codebook  sizes.  We 
plan  to  investigate  application  of  this  technique  to  TNE-based  compression  in  the  near  future. 
However,  Ramasubramaniam  and  Paliwal’s  box-search  algorithm  does  not  of  itself  appear  to 
have  advantages  over  TNE,  which  generalizes  the  box-search  technique. 
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Figure  4.4.  Notional  diagram  of  spatial  partitioning  strategy  for  TNE/VQ  image 

compression. 


4.3.  Examples  and  Analysis 

A  useful  example  application  of  TNE  for  purposes  of  initial  illustration  is  image 
compression,  with  feature  or  object  recognition  over  compressed  imagery.  In  this  application, 
TNE  can  be  thought  of  as  involving  three  steps:  (1)  image  database  partitioning,  (2)  codebook 
construction,  and  (3)  codebook  search,  which  are  described  as  follows. 


4.3.1.  Partitioning.  As  described  previously,  each  source  frame  or  image  can  be  partitioned  into 
K-pixel  encoding  blocks.  Each  encoding  block  can  be  compressed  in  two  ways.  First,  the 
number  of  source  pixels  Nsp  can  be  reduced  to  Ncp,  leading  to  a  domain  compression  ratio  CRd 
=  Nsp  /  Ncp.  Second,  the  number  of  bits  per  source  pixel  Nsb  can  be  reduced  to  Ncb  bits  per 
pixel,  leading  to  a  range  compression  ratio  CRr  =  NSb  /  Ncb-  We  have  elsewhere  shown  that  the 
net  compression  ratio  is  given  by 

CR  =  CRd-  CRr. 
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Figure  4.5.  Training  images  (256x256  pixels):  (a-e)  geometric  shapes,  (f-g) 
images  with  text,  (h-j)  natural  scenes,  and  (gj-1)  images  containing  people  or 
animals.  Note  that  lena  is  represented  only  in  g)  and  k). 


Since  blocksize  directly  impacts  the  domain  compression  ratio  CRd,  this  also  affects  CR. 
Thus,  larger  blocks  tend  to  increase  the  compression  ratio,  albeit  at  the  expense  of  visual  image 
quality  due  to  blocking  effect.  This  process  is  illustrated  in  the  notional  diagram  of  Figure  4.4. 
This  illustration  can  also  depict  multiresolution  partitioning,  which  is  possible  with  TNE.  In  the 
context  of  military  applications,  ATR  in  conjunction  with  multiresolution  image  registration  and 
compression  is  a  topic  of  our  ongoing  TNE  research,  which  we  propose  to  further  explore  as  a 
subtask  in  a  possible  Phase-II  effort. 

4.3.2.  Codebook  Construction.  The  codebook  employed  in  the  TNE  compression/pattem 
recognition  algorithm  discussed  herein  was  constructed  from  the  diverse  training  set  shown  in 
Figure  4.5.  Note  that  this  training  set  is  not  necessarily  well  suited  to  the  lena  test  image,  but  has 
been  chosen  for  its  generality.  The  codebook  was  constructed  using  the  standard  Lloyd-Max 
algorithm,  since  codebook  construction  efficiency  is  not  a  consideration  in  this  study. 

4.3.3.  TNE  Codebook  Search.  Since  TNE  primarily  employs  bit  operations,  execution  is 
efficient  at  the  level  of  feature  vectors.  The  current  prototype  of  TNE  is  coded  in  Microsoft 
FORTRAN  (32-bit  architecture).  The  host  machine  for  timing  benchmarks  listed  herein  is  based 
on  an  Intel  Pentium  processor  running  at  133  MHz,  with  a  256  KB  cache.  In  previous  tests  on  a 
Pentium  II  processor  running  at  266  MHz  with  a  512  KB  cache,  the  execution  times  were 
reduced  by  a  factor  of  approximately  4.3.  In  a  completed  implementation,  TNE’s  execution  time 
could  be  significantly  decreased  by  coding  in  C  or  assembly  language.  Benchmark  times  cited 
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below  are  for  encoding  (i.e.,  discovery  and  assignment  of  a  best- match  exemplar  index  for  each 
source  block  in  an  image)  as  well  as  decoding  (projecting  the  exemplar  indexed  by  a  compressed 
pixel  onto  the  corresponding  source  domain).  On  a  Pentium  133  MHz  processor,  the  decoding 
step  required  approximately  0.3  seconds  for  a  5 12x5 12-pixel  image,  for  exemplar  size  ranging 
from  4x4  pixels  to  16x16  pixels.  Further  details  of  execution  times  as  a  function  of  exemplar 
and  codebook  size  are  given  in  [Key99],  where  we  showed  that  TNE  offers  a  speedup,  as 
opposed  to  search  based  on  a  Euclidean  distance  matching  criterion,  ranging  from  52.8:1  to 
129.8:1,  which  is  a  typical  performance  gain  for  the  type  of  imagery  and  computer  employed  in 
these  tests.  As  M  increases,  the  resultant  speedup  tends  to  increase,  since  the  work  required  for 
TNE  is  O(log  M),  while  Euclidean-based  search  incurs  at  best  0(M)  work. 

We  next  consider  ATR  operations  over  TNE  encoded  imagery. 

4.3.4.  Example  Application:  TNE-based  ATR.  TNE  can  be  applied  to  target  recognition  by 
constructing  an  agreement  map  based  on  a  database  of  target  templates.  As  described  in  Section 
3.1,  die  TNE  encoding  of  each  target  template  comprises  a  column  of  the  agreement  map 
selected  from  the  template  database  according  to  source  pixel  greylevel  and  location  within  an 
encoding  block,  as  shown  in  Figure  4.6.  Bit  operations  on  the  agreement  map  columns,  or 
between  groups  of  columns  in  the  agreement  map,  comprise  the  analogues  of  an  image  template 
correlation  operation. 

In  practice,  TNE-based  target  classification  is  an  analogue  of  a  template  matching  operation 
over  the  range  space  of  a  TNE-encoded  image.  In  TNE,  target  templates  are  encoded  in  the 
agreement  map  columns,  and  matching  is  performed  over  a  TNE  encoding  of  each  source  block. 
This  is  exemplified  in  Figure  4.7,  which  demonstrates  the  superiority  of  TNE’s  matching 
technique  versus  Euclidean  distance  based  classification. 


The  binary  pointer  table  facilitates  construction 

of  the  Agreement  Map  based  on  stored  descriptors 

keyed  to  component  values  of  the  test  vector.  Q  exemplars 


Figure  4.6:  Schematic  diagram  of  TNE  as  a  target  classifier. 
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It  is  possible  to  configure  TNE  hierarchically  to  produce  more  efficient  search.  For  example, 
one  could  apply  an  operation  O  to  a  group  of  columns  in  an  agreement  map  mo  to  produce  a 
reduced  agreement  map  mi,  then  apply  mi  to  a  source  image  a  to  produce  approximate  target 
classification,  in  categories  Ci  =  {cn,  ci2,  ...  ,  ci„}.  Without  applying  O  to  mo,  one  has 
categories  Co  =  {  coi,  C02, ... ,  com },  where  co,  <=  cij  for  some  1  <  i  <  m  and  1  <  j  <  n,  with  n<m. 
In  preliminary  tests,  this  clustering  procedure  produces  accurate  classification  of  targets  in  noisy 
imagery,  similar  to  the  results  shown  in  Figure  4.7.  We  expect  to  analyze  and  present  related  test 
results  in  a  future  publication. 


4.3.5.  Example  Application:  TNE-Based  Compression.  Application  of  TNE  encoding  and 
search  techniques  to  the  training  imagery  shown  in  Figure  4.5  yielded  an  agreement  map  that 
was  input  to  a  TNE  classification  algorithm  configured  for  image  compression.  The 
compression  results  for  the  Lena  image  are  shown  in  Figure  4.8,  and  performance  of  the  TNE 
algorithm,  expressed  as  MSE  versus  CR,  is  graphed  in  Figure  4.9a.  The  latter  figure  instantiates 
the  relationship  between  CR,  MSE,  and  codebook  size  M,  which  is  not  apparent  in  the  small 
sample  shown  in  Figure  4.8. 
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Source  Image 
4eri$: 


Blockslze:  6>c8  pixels 
M  -  256  CR  —  64:  i 
vMSB  =  0  0377 


\ 


Blockslze:  4x4  pixels 
M  =  256  CR.=  16:1 
MSE:  o;022 


i  locksize:  Bk6  pixels 
M=  1024  CR  =  51  2  1 
MSE:  0.0355 


Figure  4.8:  Example  TNE  compression  results  on  Lena  image,  where  M  denotes  codebook 
size,  mean-squared-error  (MSE)  is  expressed  as  a  fraction  of full-scale  intensity 
over  the  interval  [0,1],  and  CR  denotes  compression  ratio. 


In  Figure  4.9a,  MSE  is  shown  to  \ary  slightly  more  than  O(CR),  thus  appearing  to  grow 
more  slowly  at  higher  compression  ratios.  This  phenomenon,  which  is  common  to  block- 
oriented  compression  transforms,  is  due  to  the  rapid  loss  of  high-frequency  detail  at  low  to 
moderate  compression  ratio  (quantization  in  small  to  moderate  blocks),  followed  by  the  gradual 
loss  of  low-frequency  detail  as  blocksize  increases.  This  is  a  physically  valid  observation,  since 
low- frequency  components  predominate  in  natural  imagery,  where  selected  spatial  information 
tends  to  scale  as  1  If  where/denotes  spatial  frequency  [Wu94].  For  example,  compare  the  result 
obtained  in  Figure  4.8  with  CR  =  16:1  at  blocksize  4x4  pixels  with  the  result  for  lOxlO-pixel 
blocks  at  CR  =  1 14.3:1.  The  codebook  sizes  are  within  a  fector  of  two;  hence,  the  decompressed 
images  are  comparable.  Note  the  loss  of  high-frequency  detail  and  the  induction  of  blocking 
effect  in  the  latter  image.  As  discussed  previously,  blocking  effect  is  a  severe  problem  in  VQ- 
compressed  imagery,  which  limits  the  achievable  compression  ratio  to  approximately  200:1  for 
512x512  pixel  imagery,  which  has  lOxlO-pixel  blocks.  The  degradation  of  image  quality  as  a 
result  of  blocking  effect  is  noted  in  Figure  4.9a,  and  measures  of  such  boundary  effects  are 
discussed  in  [Cai98],  to  which  the  reader  is  referred  for  further  detail. 
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Figure  4.9:  Example  TNE  compression  performance  on  5 12x5 12-pixel  Lena  image:  (a)  MSE 
versus  CR  for  codebook  size  M  ranging  from  64  to  4096  exemplars,  and  (b)  encoding 
times  for  Lena  imagery  of Figure  4.9  on  a  1 3  3 MHz  Pentium  running  Microsoft 
Fortran. 


4.4  Heuristic  Overview  of  TNE  Paradigm  for  Data  Fusion  and  ATR 

We  assume  that  the  ATR  process  requires  the  comparison  of  multiple  data  blocks  or  test 
vectors  from  different  sensor/classifier  assets  with  a  library  of  sought  after  target-related 
exemplar  or  training  set  vectors.  We  proceed  by  placing  acceptance  bounds  around  each 
component  value  of  each  test  vector  collected  and  determine  the  number  of  components  from 
each  training  set  exemplar  that  fall  within  these  bounds.  We  generate  a  binary  vector  for  each 
training  set  exemplar  as  was  shown  in  Figure  4.3.  This  binary  vector  indicates  the  components 
of  the  test  vector  that  agree  with  a  particular  database  exemplar.  Note  that  the  intervals  of 
agreement  can  be  all  equal  or  weighted  as  is  appropriate.  At  the  heart  of  the  TNE  paradigm  is  a 
procedure  for  very  efficiently  deriving  this  agreement  map  for  the  entire  library  without 
computation. 

As  showed  in  Figure  4.1,  we  construct  a  binary  vector  for  each  quantization  level  along  each 
coordinate  axis  (vector  block  dimension).  This  binary  vector  contains  a  bit  for  each  exemplar  in 
the  training  set.  If  the  N-space  box  corresponding  to  the  agreement  intervals  about  each  training 
set  exemplar  vector  point  projects 
onto  the  particular  quantization 
level,  a  1  is  assigned,  otherwise  a 
0.  The  utility  of  these  binary 
agreement  vectors  is  that  once  they 
are  constructed,  a  virtual 
agreement  map  for  any  new  test 
vector  can  be  specified 
immediately  by  simply  using  the  N 
test  vector  component  values  as 
pointers  into  the  larger,  pre-derived 
table.  Also,  as  shown  in  Figure 
4.10,  they  can  be  combined 
logically  to  facilitate  inferences 


Test  Vector  = 


t  Agreement  by  Component  between  Test  Vector 
117  And  Each  Training  Set  Exemplar 

Component  1  agreements  (“1”) 

•  (1 1 7)-*>  1 101 00001 0000001 1 0000001 0000000. . . 


Component  2  agreements  H") 

-*(  19)-*  0000010000101000100110001001100... 

Agreements:  component  1  AND  component  2 
00000000000000001 00000000000000. . . 

Agreements:  component  1  OR  component  2 
1101010010101001100110011001100... 

Figure  4.10  Simple  example:  a  two-dimensional  system 
(i.e.,  two  pixels  per  vector  block)  illustrates  the  utility  of 
the  binary  agreement  vectors.  Note  that  only  one 
Training  Set  exemplar  matches  both  components  of  the 
test  vector. 
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with  respect  to  test  vector  classification. 


4.4.1  Data  Fusion  in  TNE  Paradigm 

Note  that  the  process  outlined  above  is  not  limited  to  data  from  one  sensor  or  sensor-type. 
We  can  readily  accommodate  multiple  data  sources  by  ncreasing  the  dimensionality  of  the 
signature  vector  to  include  components  (e.g.,  samples,  pixels,  or  state  information)  from  all  the 
available  sources.  When  a  particular  source  is  not  present,  its  representation  in  the  search  space 
is  null. 

For  example,  if  three  sensors  collect  K,  M,  and  N  samples,  respectively,  then  the  resulting 
query  (test)  vector  will  contain  K+M+N  components.  We  illustrate  the  specification  of  the 
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Figure  4.11b:  The  TNE  Algorithm  associates  input  test  vectors  with  their  respective  training  set 
agreement  maps  (without  any  computation,  only  memory  accesses).  Multiple  test 
vectors  from  sensor  data  give  rise  to  sequences  of  agreement  maps  that  together 
contain  volume  clusters  of  agreement  events  resulting  from  the  data  collection.  These 
clusters  support  inferences  with  respect  to  target  ID  and/or  false  alarms. 
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agreement  map  via  the  TNE  algorithm  in  Figure  4.11a.  Figure  4.11b  illustrates  how  a  sequence 
of  observations  (i.e.,  test  vectors)  leads  directly  (and  at  very  low  complexity)  to  an  array  or 
sequence  of  AMs  associated  with  the  specific  collection  ssquence.  We  can  think  of  this  array 
(denoted  by  AM-1,  AM-2,  ...),  where  each  AM  corresponds  to  a  test  vector,  as  a  higher¬ 
dimensional  representation  of  agreement  events  indexed  by  training  set  exemplar  and  vector 
component,  to  be  statistically  analyzed  as  it  evolves  temporally.  This  3-D  binary  structure 
records  agreement  events  classified  throughout  a  collection  session.  Target  ID  and  false  alarm 
inferences  can  subsequently  be  derived  from  observed  clustering  of  the  agreement  events. 

Figure  4.12  illustrates  that  much  false  alarm  rejection  can  be  achieved  by  dithering  the  AMs 


8  by  8  pixel  patch  used  as 
|  test  vector.  Image  tiles  used 
;  as  training  set.  Noise  added. 


|§  Result  of  1  pixel  dither  up,  down,  right,  and 
H  left;  each  followed  by  binary  “AND”  operation. 
BE}  Then  resulting  maps  are  '  OR’ed"  to  produce 
“  ‘  this  result.  Note  the  noise  suppression^? 


Figure  4.12:  TNE  data  fusion  process:  Here,  as  an  example,  we  tile  an  image  into  8  by  8  pixel 
partitions,  which  for  this  example,  we  regard  as  a  Paining  set.  We  select  a  single  image 
partition  as  a  test  vector  and  add  noise  to  produce  20  realizations.  We  then  use  TNE  to 
generate  20  agreement  maps,  which  we  sum  and  display  as  shown  (lower  left).  We  then  dither 
each  AM  by  one  pixel  in  each  rectangular  direction  and  perform  a  binary  AND  operation  on 
each  pair.  Finally,  we  apply  the  binary  or  operation  on  the  four  results,  ie.,from  dithering 
left,  right,  up,  and  down,  as  shown  in  the  right-hand  column.  This  procedure  significantly 
reduces  noise  in  the  agreement  event  space.  The  resulting  bit  sum  measure  is  shown  at  lower 
right.  We  emphasize  that  these  results  are  obtained  with  little  computational  work. 
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by  one  pixel  and  applying  the  binary  and  operation.  We  have  found  that  this  procedure  exploits 
the  important  phenomenon  of  target  spatial  locality  in  ATR  imagery.  For  example,  we  dither  in 
each  pixel  direction  (e.g.,  NEWS)  and  then  apply  the  binary  or  operation  to  die  results.  The 
result  is  that  at  the  cost  of  a  moderate  number  of  Boolean  operations  on  a  binary  table,  TNE  thus 
yields  a  map  of  all  nonisolated  agreement  events  resulting  from  a  collection  sequence.  Note  too 
that  most  computer  systems  can  do  these  Boolean  operations  32  or  64  at  a  time  (i.e.,  by  applying 
the  binary  bit  operations  to  two  loaded  32-  or  64-bit  integers). 

4.4.2  Data  Fusion,  AM  Processing 

Pure  component  voting  of  the  agreement  map  entails  summing  each  column  to  ascertain  the 
number  of  agreements  (i.e.,  N- Hamming  distance)  per  reference  pattern.  As  indicated,  we  sum 
the  columns  of  the  agreement  map  to  obtain  a  voting  score  for  each  template  in  the  training  set. 
In  some  applications,  it  is  advantageous  to  first  apply  the  binary  or  operation  to  the  columns 
related  to  closely  correlated  templates,  for  example,  those  corresponding  to  two  configurations  or 
articulations  of  the  same  target,  prior  to  bit  summing.  In  cases  wherein  multiple  signal  vectors 
are  matched  to  clusters  of  templates  from  different  candidate  classes,  we  have  found  it  useful  to 
sort  these  sums  while  keeping  track  of  their  associated  target  IDs.  We  then  select  only  the 
maximum  few  percentiles  of  these  sums,  i.e.,  templates  that  had  a  relatively  good  agreement 
score,  and  sum  these  by  target  ID  or  target  class  as  desired. 

4.4.3  Alternate  Hardware  Implementations 

FTI  teamed  with  University  of  Florida  and  Harbor  Branch  Oceanographic  Institute  on  a 
contract  effort  (STTR  N98T-003)  with  the  Office  of  Naval  Research  (ONR)  to  inplement  TNE 
and  other  signal  processing  functions  on  reconfigurable  computers  (RCs),  particularly  field 
programmable  gate  arrays  (FPGAs).  Our  objective  in  this  effort  was  to  leverage  recent  and  on¬ 
going  team  foundational  work  to  produce  a  surveillance  processor  with  unprecedented  speed, 
robustness,  and  adaptability.  We  show  a  simplified  view  of  the  processor  architecture  in  Figure 
4.13.  As  shown,  each  FPGA  manages  a  portion  of  the  agreement  map.  The  components  of  the 
sensor  vector(s)  are  passed  to  each  FPGA  block  simultaneously  where  they  are  used  as  pointers 
into  the  stored  binary  table.  In  this  example  configuration,  column  sums  are  then  done  and  the 
best  result  reported.  Finally,  the 
global  best  match  is  readout  in  real¬ 
time. 


4.4.4  TNE-Driven  Inference  Engine 
To  illustrate  the  TNE  paradigm 
applied  to  a  sjstem  of  networked 
multiple  ATR  sensors  and  classifiers, 
consider  the  following  configuration. 
We  assume  that  each  asset  is 
continually  collecting  test  signature 
vectors  and  sending  these  directly  back 
to  a  TNE/MNN-based  inference 
engine  as  network  resources  permit. 

At  the  central  processing  point,  TNE  is 
applied  to  each  test  vector  from  each 
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Figure  4.13:  TNE  implemented  via  a  parallel  array 
_  of  FPGAs 
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asset  in  successive  time  slices.  Depending  on  the  signal  to  noise  ratio,  these  collected  vectors 
(data  blocks)  may  compare  poorly  with  any  training  set  exemplars.  This  will  result  in  random 
component  matches  in  the  resulting  agreement  maps,  with  an  absence  of  identified  targets.  After 
the  dither  process  described  and  illustrated  above,  the  AMs  would  be  expected  to  be  virtually 
null.  However,  as  the  targets  draw  nearer  to  one  or  more  recognizable  assets  (previously- 
mentioned  effect  of  spatial  locality ),  an  increased  incidence  of  agreements  with  associated 
training  set  exemplars  is  expected.  Here,  the  contributions  from  different  assets  may  be 
weighted  prior  to  their  combination,  to  account  for  changing  environmental  conditions,  in  order 
to  increase  Pd  and  decrease  Rfa.  The  utility  of  the  TNE  AM  structure  in  ATR  inference  is 
further  discussed  in  Section  6. 
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5.  MNN  Pattern  Recognition  Paradigm 

Morphological  Neural  Networks  (MNNs)  are  an  emerging  pattern  recognition  technology 
that  significantly  improves  upon  the  classification  accuracy  and  performance  characteristics  of 
classical  neural  nets  (CNNs).  Morphological  NNs  are  based  on  a  nonlinear  kernel  that  employs 
addition  and  maximum  in  a  maximum-of-sums  or  minimum-of-sums  configuration,  versus 
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classical  NNs  that  employ  multiplication  and  addition  in  a  sum-of-products  configuration.  As  a 
result  of  substituting  addition  for  multiplication,  die  computational  complexity  is  significantly 
reduced;  MNNs  can  compute  much  faster  than  their  classical  counterparts.  Also,  classical  NNs 
such  as  the  Hopfield  net  have  limited  information  storage  capacity  and  generally  unreliable 
pattern  classification  accuracy  for  all  but  a  few  highly  distinct  patterns.  In  contrast,  associative 
memories  based  on  MNNs  can  store  and  accurately  retrieve  the  theoretical  maximum  number  of 
patterns  (2N  Boolean- valued  patterns  of  length  N)  versus  only  0.1 5N  Boolean  patterns  for  the 
Hopfield  net  [Lip87].  Because  MNNs  are  based  on  an  efficient  kernel  and  converge  in  one  pass 
of  the  net,  they  have  short  training  times  that  support  the  use  of  MNNs  in  adaptive  pattern 
recognition  applications.  Section  5.1  presents  the  mathematical  theory  of  MNNs  and 
morphological  perceptrons,  while  Section  5.2  compares  MNNs  with  previous  and  related 
research.  Section  5.3  contains  several  applicative  examples  and  complexity  analysis.  In  Section 
6,  we  show  how  TNE  and  MNNs  can  work  together  to  improve  pattern  classification  efficiency 
and  accuracy  in  a  Bayesian  paradigm. 

5.1.  Mathematical  Description  of  MNNs 

We  begin  with  an  overview  of  CNN  and  MNN  concepts,  then  progress  to  MNN  perceptrons 
and  learning  algorithms.  It  is  well  known  that  content-addressable  memories  based  on  the 
Hopfield  net  have  two  key  deficiencies.  First,  the  number  of  patterns  N  that  can  be  stored  and 
accurately  recalled  is  severely  limited.  If  N  is  too  large,  then  the  net  may  converge  to  a  spurious 
pattern  that  differs  from  all  stored  exemplars,  which  produces  a  “no- match”  output  in  classifier 
applications  [Lip87].  Hopfield  remedied  this  defect  by  generating  patterns  randomly  and 
keeping  the  number  of  classes  Nc  <  0.1 5N.  For  example,  a  Hopfield  net  for  a  simple  (Nc  =10 
class)  problem  would  require  more  than  70  nodes  and  over  5,000  connection  weights.  Second, 
exemplar  patterns  in  the  Hopfield  net  are  unstable  if  they  share  many  bits  in  common  with  other 
exemplars.  By  unstable,  we  mean  that  after  the  exemplar  is  applied  to  the  net’s  input,  the  net 
converges  to  a  different  exemplar. 

5.1.1.  Mathematical  Overview.  The  primary  distinction  between  traditional  or  classical 
NNs  and  MNNs  is  the  computation  performed  by  an  individual  neuron  Computation 
represented  by  a  classical  NN  (CNN)  with  a  linear  inner-product  kernel  (multiply-accumulate 
neuron)  is  based  on  the  algebraic  structure  (R,+,-).  The  governing  equation  of  a  CNNs  f' 
neuron  is  given  by 

t  j(x)  =  I  XiWy  -1  j  ,  (5.1) 

1=1 

where  the  input  vector  is  denoted  by  x  e  Rn,  x,  denotes  die  value  of  the  t*h  neuron,  wy  denotes 
the  synaptic  strength  between  the  z*h  and  f1  neurons,  Oj  denotes  the  threshold  of  the  7th  neuron, 
and  Xj  is  the  total  input  to  the  7th  neuron. 

In  contrast,  morphological  neural  networks  use  lattice  operations  v  (maximum)  and  a 
(minimum),  and  +  from  the  semi-rings  (R_„,v,+)  or  (R^.a.+O-  Here,  the  extended  real 
numbers  R_„  =  Ru{-«}  and  R„  =  Ru{«}  have  the  following  basic  arithmetic  and  logic 
operations.  The  symbols  +  and  +'  denote  the  usual  addition  operation  with  the  stipulation  that 
a  +  (-00)  =  (-00)  +  a  =  -00  Va  e  R_oo .  Similarly,  a+'oo  =  oo+'a=:ooVaeRM.  The  operations 
v  and  a  respectively  carry  the  additional  stipulation  that  a  v (-00)  =  (-00)  va  =  «Vae  R_co 
and  aA«  =  «JAa  =  aVae  R^ . 
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Note  that  the  symbol  acts  like  a  zero  element  in  (R_„,v,+)  if  one  views  v  as  addition 
and  +  as  multiplication.  Similar  comments  hold  for  °°  in  (Ro<,,a,+')-  Also,  the  role  of  the 
multiplicative  identity  in  the  structure  (R,+,-)  is  filled  by  the  number  1  (i.e., 

1  •  a  =  a  •  1  =  a  Vae  R).  In  the  structures  (R_„  ,v,+)  and  (rm  ,  a,+')  ,  this  role  is  filled  by  0,  since 
0  +  a=a+0=a  VaeR. 

Using  (R^,v,+) ,  computation  at  a  neuron  in  an  MNN  for  input  x  =  (xi, xn)  is  given  by 

n 

t  j  (x)  —  Pj  V  fy  (Xi  +  M>lj  )  ,  (5.2) 

i=l 

or 

n 

t  j  (x)  =  Pj  +Wy  )  (5.3) 

1=1 

where  r\j  =  +1  denotes  whether  the  Ith  neuron  causes  excitation  or  inhibition  on  the  7th  neuron,  and 
Pj  =  ±1  denotes  the  output  response  (excitation  or  inhibition)  of  the  /h  neuron.  For  excitatory 
input  and  output  action,  ry  =  1  and  pj=  1,  respectively.  Inhibitory  input  and  output  responses 
have  the  values  ry  =  -1  and  pj  -  - 1,  respectively.  The  computational  model  for  a  neuron  in  an 
MNN  that  uses  the  maximum  operator  is  illustrated  in  Figure  5.1. 

The  activation  function  f§)  in  an  MNN  s  generally  a  hard  limiter.  Thus,  the  governing 
equation  of  a  morphological  neural  net  using  the  maximum  operation  is  given  by  the  following 
equation: 

n 

t  y(x)  =  fPjV  rij(xi  +Wij )  . 

1  =  1 

Via  the  dual  structure  (R^,  a,+),  the  preceding  equation  becomes 

n 

t y(x)  -f  Pj  A ry(xi  +  wij)  . 
i=  1 

The  preceding  two  equations  represent  the  basic  morphological  operations  of  dilation  and 
erosion  -  hence  the  term  morphological  neural  network. 
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Figure  5.1:  Computational  model  of  a  morphological  neuron. 


Several  advantages  accrue  from  replacing  the  arithmetic  operations  in  traditional  neural 
computing  with  lattice-based  operations.  It  is  apparent  from  Equations  (5.2)  and  (5.3)  that 
morphological  neural  computation  does  not  involve  multiplications  but  only  the  operations  of 
logical  or  as  well  as  and  (respectively  implemented  as  maximum  and  minimum  over  the  Boolean 
numbers),  together  with  addition  and  subtraction.  This  provides  for  extremely  fast  neural 
computation,  and  an  algorithm  that  can  be  more  readily  implemented  in  hardware  than  a  classical 
NN  algorithm  such  as  the  Hopfield  net.  Convergence  problems  and  lengthy  training  algorithms 
are  often  nonexistent  [Rit97a,99a].  It  has  been  shown  that  morphological  associative  memories 
are  extremely  robust  in  the  presence  of  noise  and  have  unlimited  storage  capacity 
[Rit97b,98,99b;RitXX,Sus00].  There  is  a  close,  natural  connection  between  lattice  based 
computing  and  fuzzy  set  theory,  which  makes  lattice  based  networks  amenable  to  handling  more 
general  data  types  and  particular  types  of  learning  models  [Pet98].  Finally,  morphological  neural 
networks  are  capable  of  solving  any  conventional  computational  problem  and,  thus,  can  be 
inherently  useful  in  a  wide  variety  of  application  domains  [Rit96]. 


5.1.2.  Lattice  Algebra.  In  this  discussion,  we  use  the  lattice  algebra  (R„,A,+)  for  our 
underlying  neural  computation.  This  algebra  obeys  the  following  laws:  if  a,b,ce  RM,  then 

a  +  (b  Ac)  =  (a+b)A  (a  +  c) 


a  a°°  =  oo  a  a  =  a 

<2+oo  =oo+<2  =  oo 


(5.4) 


a  +  0  =  0  +  a  =  a 

To  compare  this  system  with  the  system  (R,+,- )  used  in  traditional  neural  computing,  we  have 
the  analogous  laws,  if  a,b,ce  R,then 

a  ■  (b  +  c)  =  (a  •  b)+  (a  ■  c) 


a  +  0  =  0  +  a  =  a 
a-0  =  0a=0 
a-l=  la  =  a 


(5.5) 
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Thus,  in  the  system  (Rm  ,A,+),  the  symbol  °°  acts  as  the  null  element  and  0  as  the  unit  element, 
if  we  view  a  as  addition  and  +  as  multiplication. 

In  addition  to  the  laws  given  in  Equations  (5.4)  and  (5.5),  the  two  systems  share  many  other 
properties  such  as  commutativity  and  associativity.  In  this  sense,  the  two  systems  are 
mathematically  very  similar  and,  therefore,  one  could  be  easily  substituted  for  the  other  in  many 
application  areas.  However,  there  are  several  differences,  a  major  one  is  the  lack  of  inveises  for 
the  lattice  operation  of  minimum,  namely  a  .  It  is  this  lattice  operation  that  provides  for  the 
unique  and  distinct  properties  of  morphological  neural  networks. 

It  must  also  be  mentioned  that  Koch  and  Poggio  make  a  strong  case  for  multiplying  with 
synapses,  i.e.,  for  xt  •  wy  [Koc92].'  In  the  model  presented  herein,  multiplication  could  have 

been  used  just  as  well.  A  mathematical  equivalent  theory  can  be  obtained  by  replacing  the  lattice 
(Roo,A,+)  with  (r-°,V,+),  where  R"®={xeR:r>0}.  The  mathematical  equivalence  is 

given  by  the  isomorphism  cp :  RM  R~°  defined  by  cp(x)  =  e~x,\/xe  R ,  and  (p(°°)  =  0 .  It  is 
readily  seen  that  <p  is  a  bijection.  Furthermore,  9(x  +  y)  =  9(x)  •  <p(y)  and 
90  A  y)  =  90) v  900  •  Therefore,  cp  preserves  the  algebraic  structure  and  is  an  isomorphism; 
note  also  that  cp(0)  =  1 .  Hence,  identities  are  mapped  to  identities.  The  inverse  of  9 ,  denoted  by 

9  * :  R_0  — >R„,  is  given  by  9  *(x)  =  —  ln(x)  for  x  >  0  and  9-*(0)  =  00 .  This  proves  that, 
mathematically,  nothing  is  gained  by  replacing  the  additive  lattice  with  the  multiplicative  lattice. 
Although  a  multiplicative  lattice  is  more  intuitive  from  a  biological  standpoint,  a  reason  for  using 
additive  synapses  is  the  reduction  of  computational  cost  and  ease  of  describing  the  proposed 
(morphological)  model  of  computation.  This  is  especially  true  for  emerging  forms  of 
morphological  neural  computing,  for  example,  computing  using  single  neurons  with  dendrites 
[Rit02], 

Additionally,  the  additive  lattice  is  directly  related  to  mathematical  morphology,  a  topic  well 
known  to  the  image  processing  and  computer  vision  community  [Dav92,Hei94],  We  realize  that 
some  researchers  are  not  familiar  with  the  notion  and  uses  of  algebraically  equivalent  structures. 
In  particular,  questions  may  be  raised  pertaining  to  the  meaning  and  use  of  negative  weights, 
which  arise  quite  naturally  when  using  the  lattice  (R00,A,+).  Traditionally,  weight  is  a  positive 
quantity  with  the  sign  determining  whether  the  input  is  excitatory  or  inhibitory.  Negative 
weights  have  nothing  to  do  with  inhibition  in  the  additive  lattice,  where  the  sign  of  ry  determines 
excitatory  or  inhibitory  input.  For  a  biological  (or  traditional)  interpretation  of  negative  weights, 
simply  observe  that  these  correspond  to  large  positive  weights  in  the  equivalent  multiplicative 

lattice  (rs0A,-  )  as  w  =  e  W‘J  >  0  Vwy  e  R  and  e~x  ->00  as  x  — >  . 

The  lattice  (Rm,A,+)  has  a  natural  dual  (isomorphic)  structure  given  by  (R_00,V,+),  where 
Vx  €  R  oo,  x+ (-00)  =  (-00)  +  x  =  -00 .  For  each  Vx  e  R±(X)  =  R  U  {-00,00} ,  we  define  its  dual  or 
conjugate  by  x*  =  -x,  where  -  (-<»)  =  «> .  The  following  duality  laws  are  a  direct  consequence  of 
this  definition: 

(**)*  =  x 

(xAy)*  =  x*vy*  (5.6) 

(x  v  y)*  =  x*  a  y* 
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The  function  \|/ :  R_oo  — » RM  defined  by  \| r(x)  =  x*  is  an  isomorphism.  Therefore,  a 
morphological  neural  net  using  die  operation  v  can  always  be  reformulated  in  terms  of  the 
operation  a  ,  and  vice  versa.  Due  to  the  relations  -  (x  a  y)  =  -x  v  -y  and  -  x  a  —y  =  ~(x  v  y) 

of  Equation  (5.6),  and  the  use  of  inhibitory  input  and  output,  the  algebra  (R_eo,V,+)  is 
implicitly  incorporated  in  Dr.  Ritter’s  current  research  in  dendritic  computation  in  morphological 
neurons  [Rit02], 

5.1.3.  Morphological  Operations.  Given  the  computations  represented  by  Equations  (5.2)  and 
(5.3),  let  V  denote  the  transpose  of  the  vector  v.  The  total  network  computation  for  a  matrix 
memory  such  as  a  fbpfield  or  Kohonen  memory  with  input  a  at  time  t  can  be  expressed  in 
matrix  form  as 

T{t+\)  =  Wa(t),  (5.7) 

where  a(t)  =  (ai(t),~-,an(t))' ,  T(t +1)  =  (ti(f +l),"-,T„(t+l))',  and  W  denotes  the  nxn 
synaptic  weight  matrix  whose  i/h  entry  is  wy.  Analogous  to  Equation  (5.7),  the  total 
morphological  network  computation  for  a  morphological  matrix  memory  can  also  be  expressed 
in  matrix  form.  In  order  to  do  this,  we  need  to  define  a  matrix  product  in  terms  of  the  operations 
of  the  lattice  structure  (R,v,a,+)  .  For  an  mxp  matrix  A  and  a  pxn  matrix  B  with  entries  from  R, 

the  matrix  product  C  =  A  E1B ,  also  called  the  max  product  of  A  and  B,  is  defined  by 

p 

cij  =  v  aik  +  hj  =  (ai\  +A y ) v (ai2  +  b2j)v-v (aip  +  bpj ) .  (5.8) 

k=\ 

The  min  product  of  A  and  B  induced  by  the  lattice  structure  is  defined  in  a  similar  fashion. 
Specifically,  the  ifh  entry  of  C  =  A  □£  is  given  by 

p 

Cjj  =  A  aik  +  bkJ  =  [ai{  +  hy )  a  [ai2  +  b2j )  a  •  •  •  a  (aip+bpj  )  .  (5.9) 

k=l 

The  total  MNN  computation  for  an  associative  memory  can  now  be  expressed  in  the  matrix 
forms 

T(t+l)  =  WMa(t)  (5.10) 

and 

T(t  +  l)  =  WUa(t)  (5.11) 

respectively. 

Some  additional  comments  concerning  lattice-based  operations  are  pertinent  when  discussing 
morphological  network  computations.  When  using  the  lattice  (R,v,a+),  the  maximum  (or 

minimum)  of  two  matrices  replaces  the  usual  matrix  addition  of  linear  algebra.  Here,  the  y-th 
entry  of  the  matrix  C -  Aw  B  is  given  by  Cy  =  ay  v by .  Similarly,  the  minimum  of  two 

matrices  C-  A  aB  is  defined  by  cy  =  ay  a  by .  Finally,  we  say  that  A  is  less  or  equal  than  B, 

denoted  by  A<  B,  and  A  is  strictly  less  than  B,  denoted  by  A  <  B,  if  and  only  if  for  each 
corresponding  entries  of  these  matrices  we  have  that  ay  <  by  and  ay  <  by,  respectively. 

The  algebraic  structure  (R,v,a+)  provides  for  an  elegant  duality  between  matrix  operations. 
For  any  real  number  r,  we  define  its  additive  conjugate  r*  =  -r.  Therefore, 

(r*)*=r  and  r  am  =  (r  *  v«*)*,  Vr,weR.  (5.12) 
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Now,  if  A-  ( ay)mxn  is  an  m  X  n  matrix  with  ay  e  R ,  then  the  conjugate  matrix  A*  of  A  is  tins  n't, 

m  matrix^*  =  (by)mxn  defined  by  by  =  [a/,]*,  where  [a/,]*  denotes  the  additive  conjugate  of  a ji.  It 
follows  that 

A  a  B  =  (A*v  B*)*  (5.13) 

and 

AEB  =  (A*EIB*)*  (5.14) 

for  appropriately  sized  matrices.  This  implies  that  a  morphological  neural  net  using  the 
operation  □  [i.e.,  Equation  (5.10)],  can  always  be  reformulated  in  terms  of  the  operation  □ 
[i.e.,  Equation  (5.11)],  and  vice  versa,  by  using  the  duality  relations  expressed  in  Equations 
(5.13)  and  (5. 14). 

Having  defined  the  necessary  mathematical  tools,  we  next  discuss  basick  properties  of 
morphological  neural  networks  and  present  some  examples. 


5.1.4.  Morphological  Associative  Memories.  The  ability  of  human  beings  to  retrieve 
information  on  the  basis  of  associated  cues  continues  to  elicit  great  interest  among  researchers. 
For  example,  a  few  pictures  from  a  movie  clip  can  evoke  memory  of  the  entire  movie’s  plot. 
Similarly,  a  glimpse  of  a  partially  occluded  face  in  a  crowd  can  be  sufficient  basis  for 
recognizing  an  old  friend.  Investigations  of  how  the  brain  is  capable  of  constructing  such 
associations  from  partial  information  have  led  to  a  variety  of  theoretical  neural- network  models 
that  act  as  associative  memories.  The  basic  goal  of  these  artificial  associative  memories  is  the 
retrieval  of  complete  sorted  pattern  from  noisy  or  incomplete  input  pattern  keys.  These 
associative  memories  are  categorized  as  hetereoassociative  or  autoassociative,  with  the  latter 
comprising  the  primary  focus  of  this  discussion.  Among  all  autoassociative  networks  the 
Hopfield  network  is  the  most  widely  known  [Hop82a-b,86].  A  large  number  of  researchers  have 
exhaustively  studied  this  network,  its  variations,  and  generalizations  [Abu85,Ami85,Che86, 
Den86,Kee86,Kos87a-b,McE87]. 

In  classical  neural  network-theory,  for  a  given  input  vector  or  key  x  =  (xj - * ,cc„ )',  an 
associative  memory  W recalls  a  vector  output  signal  f{ y),  where  y  =  W  ■  x.  If  /(y^)  =  y,-  Vi, 

then/(y)  =  y  and  the  memory  is  called  a  linear  associative  memory,  otherwise  it  is  referred  to  as 
a  semilinear  associative  memory.  A  basic  question  concerning  associative  memories  is:  What  is 

the  simplest  way  to  store  k  vector  pairs  (x^y1),---,^^),  where  x^  e  R”  and  y^  e  Rw  for 
t,  =  l,---,k  in  an  m  x  n  memory  W?  The  well-known  answer  to  thtat  question  for  linear  and 
semilinear  associative  memories  is  to  set 


W  =  Xy^  -(I*)'. 

5-1 

In  this  case,  the  if1  entry  of  W  is  given  by  wy  =  iLjyj’*^  .  If  the  input  patterns  x1 
orthonormal,  that  is 

then 

W-xl  =(y1  •(x1)'+,"  +  y*  '(x*)')-x*  =y* . 


(xJy-x*  = 


(5.15) 
,x*  are 

(5.16) 

(5.17) 
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Thus,  we  have  perfect  recall  of  the  output  patterns  y 1 ,  •  •  • ,  y  * .  If  x1 ,  •  •  • ,  x*  are  not  orthonormal 
(which  they  are  not  in  most  realistic  cases),  then  the  noise  term 

N  =  Xy7  •((xy)'-xl)  *0  (5.18) 

i*J 

contributes  to  crosstalk  in  the  recalled  pattern  by  additively  modulating  the  signal  term. 
Therefore,  filtering  processes  using  threshold  functions  become  necessary  in  order  to  retrieve  the 
desired  output  pattern. 

Morphological  associative  memories,  which  are  based  on  the  lattice  algebra  described  in 
Sections  5.2  and  5.3,  are  surprisingly  similar  to  these  classical  associative  memories.  Suppose 

we  are  given  a  vector  pair  x  =(x1,---,x„)/e  R”  and  y  =  Ov*',yOT)'e  Rm .  An  associative 
morphological  memory  that  will  recall  the  vector  y  when  presented  the  vector  x  is  given  by 

•••  y\~xn 

w  =  y  □  (-x)'  =  :  \  :  (5.19) 

Jm~x  1  -  ym-xn j 

since  W  satisfies  the  equation  W  EJ  x  =  y ,  as  can  be  verified  by  the  simple  computation 

(n-Xi+xi)' 

fTElx=  i  =  y .  (5.20) 

^  i=l  (ym  ~xi  +xi ) 

V  J 

Note  the  similarity  between  Equations  (5.15)  and  (5.19)  when  k  =  1.  The  natural  question 
one  may  ask  is  whether  or  not  this  concept  can  be  extended  to  cases  where  k  >  1 .  The  answer  is 
a  qualified  “yes”,  due  to  the  fact  that  problems  which  in  some  way  are  analogous  to  those 
associated  with  ordinary  associative  memories  [i.e.,  Equation  (5.18)]  also  occur  in  morphological 
associative  memories. 

Henceforth,  let  (x l,yI),---,(x^,yA:) be  k  vector  pairs  with  where  x^  =(xjv- •,*■»)  e  Rb  and 
y^  =(y^,-"5y«)e  Rm  for  H,  =  1, •••,£.  For  a  given  set  of  pattern  associations 
j(x^,y^):£  =  l,  •••,*}  we  define  a  pair  of  associated  pattern  matrices  (X,Y),  where 
X  =  )  and  7  =  (yV",y*)  .  Thus,  X  is  of  dimension  n  x  k  with  ifh  entry  xj  and7is 

of  dimension  m  x  k  with  ijth  entry  yj  . 

Since  y^E3 (-x^)'  =  y^EJ(-x^)',  we  reduce  notational  burden  by  denoting  these  identical 

morphological  outer  vector  products  by  y^  x(-x^)'.  With  each  pair  of  matrices  (7,7),  we 
associate  two  natural  morphological  mxn  memories  Wxy  and  Mxy,  defined  by 

Wxy  =  A  [y^  x(-x*)']  and  Mxy  =  v  [y^  x(-x^)'] .  (5.21) 

5=1  5=1 

It  follows  from  this  definition  that 

Wxy  —  y^  x(— x^)  -  M xy  VI;  =  1 ,•••,£.  (5.22) 
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In  view  of  Equation  (5.20),  this  last  set  of  inequalities  implies  that 

Wxy  Ex^  <  (y^  x(-x^)'Jox^  =  y^  =  |y^  x(-x^)'J<  Mxy  Qx^ 

V£,  =  !,•••.£  or,  equivalently,  that 


wXYnx<Y<MXY®x . 


Example  1.  Let 


x  =  0  ,  y  =  1  ,  x  =  -2  ,  y  =  -1  ,  x  =  -3  ,  y  =  -2  .  (5.25) 


According  to  Equation  (5.21),  the  memories  Wxy  and  Mxy  are  given  by 


=  A  y^x(-x^)'] 


(5.23) 


(5.24) 


0  00-11303  0 

=  1  1  1  A  -1  1  3  A  -2  1  -2 


00  0  02  4  03  0 


-10  0 


=  -2  1-2 

0  0  0 


(5.28) 


Mxy  =  v[y^x(-x^)'] 
1=1 


f 0  0  0]  f-1  1  3W  0  3  O'] 

=  1  1  .  1  v  -1  1  3  v  -2  1  -2  •  (5‘29) 

000  024  03  0 

v  /  v  /v 

r0  3  3^ 

=  113 

0  3  4 
v  / 

It  can  be  easily  verified  that  W^y  Ox^  =  y^  =  x^  holds  for  £  =  1,2,3.  For  example, 

r-\  0  0^  (0^  (o' 

Wxy  Q  x1  =  -2  1  -2  □  0  =  1  =yl  (5.30) 

0  0  0  0  0 


r0  3  3 (  0^  (-1> 

M xy  El  x 2  =  1  1  3  Q  —2  =  —  1  =  y ^ 
034  -4  0 


(5.31) 
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Definition  1.  A  matrix  A  =  (ay  )mXn  is  said  to  be  a  □  -perfect  memory  for  (X,  Y)  if  and  only  if 
A  E3X=  Y.  The  matrix  A  is  said  to  be  a  □  -perfect  memory  for  (X,  1)  if  and  only  if  A  0X  =  Y. 

Obviously,  A  is  □ -perfect  for  (X,Y)  if  and  only  if  AU  x^  =  y^  for  ^  .  Similarly^  A  is 

□  -perfect  for  (X,Y)  if  and  only  if  A  □  x^  =y^  for  £  =  1, •••,£.  Also,  if  A  is  □ -perfect  for 

(X,Y),  then  [a  □  x^  }=yf  for  all  ^  =  1,  -  -  * ,  k  and  all  i  =  Equivalently, 

V  (a y  +  x*=j  )=  y }  =  1,-  •  •,  k  and  i  =  1,  —  ,m .  (5.32) 

It  follows  that,  for  an  arbitrary  index  j  e  {1,  ■ •••,«},  we  have 

aij+x)-y ?  V^  =  l ,—,k 

^aij<y}-x)  =  (5.33) 

^ay<A^}-xj)=Wy 

This  shows  that  A  <  Wxy .  In  view  of  Equation  (5.26),  we  now  have 

Y  -A  El  X  <  Wxy  ^  x  <Y  and,  therefore,  Wxy  B  X  =  Y .  A  similar  argument  shows  that  if 

^  is  □  -perfect  for  (X,Y),  then  Mxy  <  B  and  M xy  □  X  —  Y .  This  establishes  the  following 
optimality  criteria  for  morphological  associative  memories: 

Theorem  1.  If  A  is  □  -perfect  for  (X,Y)  and  B  is  □  -perfect for  (X,Y),  then 

A<Wxy<Mxy<B  and  WXY®X  =  Y=MXy®  X.  (5.34) 

According  to  this  theorem,  Wxy  is  the  least  upper  bound  of  all  □  -perfect  recall  memories  and 
Mxy  is  the  greatest  lower  bound  of  all  □  -perfect  recall  memories  for  (X,  Y).  Furthermore,  if  there 
exists  a  □ -perfect  (□ -perfect)  recall  memory,  then  Wxy  (resp.  Mxy )  is  also  a  perfect  recall 
memory. 

The  next  obvious  question  concerns  the  existence  of  a  perfect  recall  memory.  Specifically, 
for  what  vector  pairs  frV),  ...,  (x*,y*)  will  Wxy  or  Mxy  provide  perfect  recall?  Once  this 
question  has  been  answered,  the  next  logical  question  pertains  to  the  amount  of  noise  Wxy  or  Mxy 

can  tolerate  for  perfect  recall;  i.e.,  if  x^  denotes  a  distorted  version  of  x^ ,  what  are  the 
conditions  or  bounds  on  x^  to  ensure  that  Wxy  □  x^  =  or  Mxy  □  x^  =  y^  ?  We  provide 

answers  to  these  questions  through  a  series  of  theorems  and  corollaries.  Proof  of  these  theorems 
are  given  in  [Rit98]. 

The  following  theorem  answers  the  existence  question  of  perfect  recall  memories  for  sets  of 
pattern  pairs. 

Theorem  2.  Wxy  is  ^-perfect for  (X,Y)  if  and  only  if  for  each  £  =  ,  each  row  of  the 

matrix  [y^  x(-x^)']- contains  a  zero  entry.  Similarly,  Mxy  is  □  -perfect  for  (X,Y)  if  and 
only  if,  for  each  ^  =  1,  •  •  • ,  k  ,  each  row  of  the  matrix  M xy  —  [y^  x  (— x^ )'  ]  contains  a  zero  entry. 
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Accordingly,  Wxy  E  x^  =  y^  Vi;  =  if  and  only  if  for  each  i;  and  each  row  index  i 

there  exists  a  column  index  j  (depending  on  £,  and  i)  such  that  Wy=  [y^x(-x^)']y ,  where  wy 

denotes  die  i/h  entry  of  Wxy-  Similarly,  Myy  0  ^  Vi;  =  1,  •  •  -,k  if  and  only  if  for  each  \ 
and  each  row  index  i  there  exists  a  column  index  j  (depending  on  t,  and  i)  such  that  wy- 

[y*x(-**)V. 

Example  2.  The  memories  Wxy  and  Mxy  given  in  Example  1  satisfy  the  conditions  of  Theorem 
2.  For  example,  for  i;  =  2,  we  have 

wn  =  y\  - x\  =  [y2 x(-x2)’]n 

w22  =yi  ~x2  =[y2x(-x2)']22  (5.35) 

w3i  =  yl  -  xi  =  [y2  x  (-x2)']3i 

and 

m\3  =  y\  ~x3=  [y2  x  (-x2  )']13 

m22  =y\-x2=  [y2 x(-x2)']22  .  (5.36) 

»»33  =yl~xi=  [y2  x  (-x2  )']33 
The  following  is  an  easy  consequence  of  Theorem  2. 

Corollary  2.1.  WXyE\X  =  7  if  and  only  if  for  each  row  index  i  =  1  ,•••,/»  and  each  ye  {1,  -  -  - ,  A} 
there  exists  a  column  index  ye  {1, •*•,«}  depending  on  i  and y  such  that 

x]  =  V{*)-y})+yh  (5.37) 

Mxy^X  =  Y  if  and  only  if  for  each  row  index  and  each  ye  (1,-  •  •  ,k)  there  exists  a 

column  index  je  (1, •••,«}  depending  on  i and  y  such  that 

x]=  Mx)-yh+yJ-  (5.38) 

The  next  theorem  provides  bounds  for  the  amount  of  distortion  of  the  exemplar  patterns  x^  for 
which  perfect  recall  can  be  assured. 

Theorem  3.  Let  xY  denote  a  distorted  version  of  the  pattern  xY.  Then  JFjyH  x7  =  yY  jf  ^ 
only  if 

(  \ 

m 

x]<x]vA  V[yJ -y>+xj]  Vy=l,-,n  (5.39) 

i=l  £*Y 

V  ) 

and  for  each  row  index  ie  {!,-••, w}  there  exists  a  column  index  jt  e  {!,-••,«}  such  that 
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*l=xl v  yi Vi-y}+x\l  • 


(5.40) 


Similarly,  0xY=  yY  if  and  only  if 


x] >x]aV  A [yj-y}+xj]  V/  =  l,-,n 

1=1 


(5.41) 


and  for  each  row  index  there  exists  a  column  index  jt  e  {!,-••, n}  such  that 


x].=xl  A  +xj=] 

•/»  ./l 


(5.42) 


The  subscript  notation  jt  denotes  the  dependence  of  the  column  index  j  on  the  particular  row 
index  i. 

Example  3.  Consider  the  following  set  of  associated  vector  pairs: 

<-5'\  f-A  O'!  2\  (5> 

x  =  -6  ,  y1  =  1  ,  x2=  -4  ,  y2=  5  ,  x3  =  0  ,  y3=  7  .  (5.43) 

V  V  l-3J  ^-1J  ^-1> 

If  x1  =  (-4-8-1/,  then  x)  <  x}  v  A3=  , (V|=2 [yj  -yf  +  xj])  for  j  =  1,  2,  and  3.  For 
instance, 

3^3  > 

x\vA  V[y}-yf+xf] 

/=!  £=2 


=  V 


A  |y)  -  yf  +  xf )  v  (yj  -  yf  +  X? 


=  -5  v  [((-1  -  2)  v  (-1  -  6))  a  ((-1  -  5)  v  (1  -  7))  A  ((-3  + 1)  v  (-3  +  5))] 
=  -5v[(-3)a(-4)a2] 

=  — 4  >  x  J 

Also,  if  i  =  1,  then  for  the  column  index  j  =  3  we  have 


(5.44) 
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*3  v  VoJ-jyf  +  xf] 
4=1 


2  ,  2 
■yi  +*3) 


=  x\  v  |y)  -  y\  +x32)v  (yj  -  y\  +  xj )] 

=-l  v  [((-1-2  +  0)  v  (-1-5+0))]  (5.45) 

=-lv[(-3)v(-7)] 

=  -l  =  *3 

A  similar  comparison  shows  that  for  i  =  2,  the  column  index  j  =  1  satisfies  the  equality 
*1  =*!  v  (V|=2 1>3  ~y}  +4ll  311(1  f°r  i  ~  3,  the  column  index  j  =  3  satisfies  the  equality 

*3  =x]  v(Vt=2  [>^3 +^|])  Therefore,  the  pattern  x1  satisfies  the  two  conditions  of 

Theorem  3  given  by  Equations  (5.41)  and  (5.42).  In  this  particular  case,  the  memory  WXy  is 
given  by 

(  2  5  0\ 


Wxy=  5  7 
-1  0 


(5.46) 


f  2  5 

W  □  x1  =  5  7 


2  0  -8 


r1  0  ~2)  \ 

(-2)  v  (-3)  v  (-1) 
lv(-l)vl 
(-5)  v  (-8)  v  (-3) 


=  l  =y! 


(5.47) 


which  is  in  agreement  with  Theorem  3. 

In  the  above  example,  x}  <  xj  and,  therefore,  x1  ^  x1 .  However,  perfect  recall  is  still 

achieved.  We  shall  reflect  on  this  observation  in  our  subsequent  discussion  on  perfect  recall  of 
corrupted  input  patterns. 

5.2.  Autoassociative  Morphological  Memories 

If  X  —  Y (i.e.,  y^  =  x^,  for  ^  =  1, •••,&),  then  Wxx  and  Mxx  are  called  autoassociative 
memories.  The  classical  example  of  a  semilinear  autoassociative  memory  is  the  Hopfield  net 
[Hop82a,Lip87].  As  shown  in  Section  5.1,  the  Hopfield  net  is  given  by 


Xp_t  xf  •  x)  if  i  *  j 
W:;  =  <  s=‘  1  J  J . 

0  if  i  =  y 


(5.48) 
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For  an  input  pattern  x  =  (xi, ...,  x„),  the  Hopfield  net  algorithm  proceeds  by  initializing  x/0)  =  xj 
at  time  t  =  0  and  then  iterates  the  recursive  formula  Xj(t +l)=/[X”_1wJyX,(f)]  until 

convergence.  The  function  /  is  a  hard  limiting  nonlinearity. 

The  morphological  analogues  of  the  Hopfield  memory  are  derived  from  Equation  (5.23)  and 
given  by 

k  k 

Wfj  =  A(xf-x^)  and  mif  =  V  (xj>  -x*h  .  (5.49) 

£=l  J  %=\  J 

As  an  example,  consider  the  five  pattern  images  p1, ...,  p5  shown  in  the  top  row  of  Figure  5. 
2.  Each  p^  is  an  18xl8-pixel  Boolean  image.  Using  the  standard  row-major  scanning  method, 

each  pattern  image  p^  can  be  converted  into  a  pattern  vector  x^  =  (xjV*  •  ,x|24 )  by  defining 

£  _ll  if  =  1  {black  pixel) 

18 (*  ^+-/  0  if  p^(z,y)=0  (white  pixel) 

In  this  case,  we  obtain  the  perfect  recall  Wxx El  x^  =  x^  and  M  xx^  x^  =  x^  for  4  =  1,  ...»  5. 
However,  using  a  Hopfield  memory  for  these  patterns  does  not  result  in  perfect  recall  even  in  a 
nondistorted  pattern  is  presented  to  the  memory.  The  bottom  row  of  Figure  5.2  shows  the  output 
of  the  Hopfield  net  when  presented  with  the  patterns  representing  the  letters.  Although  the 
patterns  “A”  and  “X”  resulted  in  perfect  recall,  the  patterns  “B”,  “C”,  and  “E”  all  converged  to  a 
configuration  not  represented  by  any  of  these  patterns.  The  reason  for  this  is  that  these  three 
patterns  are  not  very  orthogonal  to  each  other;  i.e.,  there  is  a  large  subset  of  nonzero  pixels 
common  to  all  three  patterns.  It  is  well  known  that  Hopfield  memories,  which  are  linear 
combinations  of  pattern  features,  have  difficulties  in  memorizing  patterns  that  share  too  many 
bits  [Lip87]. 

ImMnP  • 


AC  G*  -  V*  E2 


Figure  5.2.  The  five  patterns  in  the  top  row  were  used  in  constructing  the  Hopfield  memory  and 
the  morphological  memory.  The  output  for  either  autoassociative  morphological  memory  Wxx 
or  Mxx  is  identical  to  the  input  patterns.  The  bottom  row  represents  the  output  patterns  of  the 
Hopfield  net  when  presented  with  the  respective  patterns  in  the  top  row. 


Although  some  improvements  could  be  obtained  by  using  certain  orthogonalization 
procedures  [Wal85,Gra86],  an  even  more  severe  limitation  for  die  Hopfield  memory  is  the 
number  of  patterns  that  can  be  stored  for  accurate  recall.  For  instance,  doubling  the  number  of 
patterns  from  five  to  ten  by  introducing  the  lower  case  letters  shown  in  Figure  5.3  results  in  a 
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complete  recall  failure, 
with  the  patterns  shown 

Figure  5.4  shows  the  output  from  the  Hopfield 
in  Figure  5.3. 

memory  when  presented 

A 

B 

G 

X 

E 

a 

b 

c 

X 

(3 

Figure  5.3.  The  ten  patterns  used  in  constructing  the  Hopfield  memory  and  the  morphological 
memories.  The  output  of  either  autoassociative  morphological  memory  Wxx  or  Mxx  is  identical 
to  the  input  patterns. 


Figure  5.4.  The  output  of  the  Hopfield  memory,  which  remains  the  same  no  matter  which  one  of 
the  patterns  shown  in  Figure  5.3  is  presented  to  the  memory. 


5.2.1.  Perfect  Recall.  In  contrast  to  the  Hopfield  memory,  in  the  absence  of  noise  a  natural 
morphological  autoassociative  memory  will  always  provide  perfect  recall  for  any  number  of 
patterns  programmed  into  its  memory!  The  fact  that  morphological  autoassociative  memories 
have  unlimited  storage  capacity  is  given  by  the  next  theorem. 

Theorem  4.  WxxElX  =  X  and  MxxHX  =  X . 

Note  that  the  theorem  places  no  restriction  on  the  type  or  number  of  patterns.  In  particular, 
the  natural  morphological  memories  give  perfect  recall  for  all  ten  patterns  or,  if  desired,  the 
entire  Roman  alphabet  of  upper-  and  lower-case  letters.  Additionally,  a  consequence  of 

Theorem  4  is  that  the  memories  Wxx  and  Mxx  are  stable  with  respect  to  the  input  vectors  x^.  In 
fact,  as  the  next  theorem  shows,  the  memories  converge  in  one  step  for  any  input  pattern  z. 
Theorem  5.  If  Wxx&  z  =  v  and  Mxxn  z  =  u,  then  WXjf3  v  =  v  and  u  =  u . 

Theorems  4  and  5  point  out  another  major  difference  between  Hopfield  memories  and 
morphological  autoassociative  memories.  Not  only  is  there  an  immense  difference  in  storage 
capacity,  but  perfect  recall  is  guaranteed  and  recall  occurs  in  one  step  that  can  be  implemented  in 
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parallel.  In  contrast,  Hopfield  memories  must  be  run  iteratively  with  neurons  firing  randomly, 
one  at  a  time,  and  there  is  no  guarantee  of  perfect  recall  for  perfect  inputs.  Thus,  morphological 
associative  memories  behave  more  like  human  associative  memories.  Once  a  pattern  has  been 
memorized,  recall  is  instantaneous  when  the  human  brain  is  presented  with  the  pattern. 

5.2.2.  Recall  of  Corrupted  and  Occluded  Patterns.  Since  the  Hopfield  net  is  not  capable  of 
recalling  a  large  number  of  perfect  exemplar  patterns,  the  Hopfield  net’s  performance  is  even 
further  degraded  when  these  patterns  are  corrupted  by  noise  or  occluded  by  foreground  artifacts. 
In  contrast,  the  autoassociative  morphological  memories  discussed  thus  far  are  extremely  robust 
to  certain  types  of  noise  and  occlusions. 


Figure  5.5.  The  top  row  shows  the  corrupted  input  patterns  and  the  bottom  row,  the 
corresponding  output  patterns  of  the  morphological  memory  Wxx- 


We  say  that  the  distorted  version  xY  of  the  pattern  xY  has  undergone  an  erosive  change 

whenever  xY<xY  and  a  dilative  change  whenever  xY>x^.  Thus,  for  the  above  Boolean 
patterns,  a  change  in  pattern  values  from  p (if)  =  1  to  p (if)  =  0  represents  an  erosive  change, 
while  a  change  from  p (if)  =  0  to  p (if)  =  1  represents  an  dilative  change.  The  morphological 
autoassociative  memory  Wxx  is  extremely  robust  in  recalling  patterns  that  are  distorted  due  to 
erosive  changes.  These  changes  can  be  random  (system  noise,  partial  pattern  occlusion,  etc.)  or 
nonrandom  (processing  effects  such  as  skeletonizing,  filtering,  etc.)  The  limitations  on  the 
amount  of  distortions  are  given  by  the  theorems  stated  in  the  preceding  section.  For  example, 
defining  Wxx  in  terms  of  the  ten  patterns  shown  in  Figure  5.3,  and  corrupting  any  of  the  patterns 
with  50  percent  randomly  generated  erosive  noise  (i.e.,  black  pattern  pixels  turned  white 
randomly,  pixel  by  pixel)  always  resulted  in  perfect  recall.  Figure  5.5  shows  three  experimental 
examples  where  the  pattern  represented  by  die  letter  “X”  was  intentionally  corrupted  by  use  of 
large  erosive  changes.  In  contrast  to  the  Hopfield  net,  which  failed  to  recall  the  exemplar  pattern 
in  each  case,  the  morphological  autoassociative  memory  provides  perfect  recall  in  all  three  cases. 
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Figure  5.6.  The  top  row  shows  the  corrupted  input  patterns  and  the  bottom  row,  the 
corresponding  output  patterns  of  the  morphological  memory  Mxx- 


Although  the  performance  of  the  autoassociative  morphological  memory  Wxx  is  excellent  in 
the  presence  of  erosive  noise,  pattern  recall  becomes  impossible  when  dilative  noise  in 
introduced.  For  dilative  noise,  the  dual  memory  Mxx  proves  to  be  the  ideal  associative  memory. 
This  follows  from  Theorem  4  and  is  also  supported  by  experiment.  For  example,  defining  Mxx  in 
terms  of  the  ten  patterns  shown  in  Figure  5.3  provides  for  an  associative  memory  that  is 
extremely  robust  in  the  presence  of  dilative  noise  of  these  patterns.  For  instance,  the  top  row  of 
Figure  5.6  shows  several  version  of  die  letter  “X”  corrupted  by  severe  dilative  changes.  In  each 
of  these  cases,  the  memory  Mxx  provided  perfect  recall. 

The  problem  with  the  associative  memories  Wxx  and  Mxx  is  that  Wxx  is  incapable  of 
effectively  handling  dilative  noise  while  Mxx  is  incapable  of  effectively  handling  erosive  noise. 

A  consequence  of  Theorem  3  is  that  if  xY>xY,  then  in  many  instances,  ( Wxx  B  xY)i*^,and 

if  xY>xY,  then  we  have  a  high  probability  that  [mxx  H  xy\*y}.  Hence,  dilative  noise 
destroys  perfect  recall  for  Wxx  while  erosive  noise  destroys  recall  tor  Mxx-  However,  we  need  to 
recall  that  Example  3  shows  that  a  certain  amount  of  dilative  noise  may  not  destroy  perfect  recall 
for  Wxx-  A  similar  observation  holds  for  erosive  noise  and  Mxx- 
5.3.  Perfect  Recall  Conditions  for  Distorted  Inputs 

Although  theorems  provide  necessary  and  sufficient  bounds  for  the  class  of  inputs  that  will 
result  in  perfect  recall,  they  also  point  out  the  extreme  vulnerability  of  the  morphological 
memories  Wxx  and  Mxx  if  patterns  exhibit  both  erosive  and  dilative  noise.  For  this  reason,  the 
vector 

Wxx®  fa  XX  E  xY)  (5.51) 

will  generally  not  correspond  to  xY  if  xY  corresponds  to  a  corrupted  version  of  xY  containing 
both  erosive  and  dilative  noise.  In  fact,  since  xY  contains  erosive  noise,  the  expression 
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MXX  0  will  most  likely  correspond  to  the  zero  vector.  However,  the  naive  idea  of  first 
presenting  a  noisy  pattern  to  a  memory  that  is  insensitive  to  dilative  noise  followed  by  presenting 
the  output  pattern  to  a  memory  that  is  insensitive  to  erosive  noise,  as  represented  by  Equation 
(5.51),  does  furnish  a  first  step  in  constructing  an  associative  morphological  memory  that  is 
robust  in  the  presence  of  random  noise,  random  occlusion,  as  well  as  being  capable  of  recalling 
either  skeletonized  or  dilated  versions  of  patterns. 

The  fundamental  idea  of  creating  a  morphological  associative  recall  memory  that  is  robust  in 
the  presence  of  random  noise  (i.e.,  both  dilative  and  erosive  random  noise)  is  to  replace  the 
associative  memory  Mxy  with  a  sequence  of  two  memories  M\  and  W\  that  are  defined  in  terms 
of  subpattems  of  the  exemplar  patterns  x\  Basically,  the  memory  M\  is  constructed  such  that  it 
can  associate  randomly  corrupted  version  of  the  exemplar  x  with  a  subpattem  zs  consisting  of  a 
few  select  pattern  values  from  x  \  The  pattern  z*  represents  a  special  eroded  version  of  x  .  In 
the  Boolean  case,  ideally,  z*  is  a  sparse  representation  of  x^ having  the  property  that  z3 «  x3 and 

z^axt=0  whenever  £  *  y.  Here  «  denotes  “much  smaller  than”  in  the  sense  of  containing 
mostly  zero- valued  pixels  ,  and  0  denotes  the  zero  vector.  The  morphological  memory  M\  is 

defined  as  an  associative  memory  for  the  patterns  x^,  £  =  1  ,•••,£  so  that  Mj  E3  x^=z^. 
Furthermore,  since  x^  represents  a  version  of  z  corrupted  only  by  dilative  noise,  we  also  have 
M\  □  x^  =  . 

The  memory  W\  is  defined  as  an  associative  memory  that  associates  with  each  input  x^  the 
pattern  y^.  Thus,  under  the  correct  conditions,  we  obtain  W\  □  (mj  □  x^)=  W\  □  =  y^  as 

well  as  W\  □  (mjO  =  W\  □  zS  =  y^  for  randomly  corrupted  patterns  x^ . 

We  now  make  these  notions  more  rigorous. 

Definition!.  Let  Z  =  (zl  ,z2,---,zk)  be  an  nxk  matrix.  We  say  that  Z  is  a  kernel  for  (X,  Y)  if  and 
only  if  the  following  two  conditions  are  satisfied: 

1.  Mzz  □  X  =  Z; 

2.  WZY  0  Z  =  Y. 

It  follows  that  if  Z  is  a  kernel  for  (AT,  Y),  then 

WZY  0  {MZZ  □  X)=WZYUZ=  Y.  (5.52) 

Equation  (5.52)  defines  the  morphological  associative  memory  system 

input  -»  MZz  ->  Wxy  output  (5.53) 

which,  for  a  properly  chosen  kernel  Z,  will  be  robust  in  the  presence  of  random  noise  in  input  X. 

Definition  2  does  not  provide  for  a  methodology  of  finding  kernels  nor  does  it  furnish  direct 
insight  into  the  question  of  why  kernels  are  useful  in  constructing  associative  memories  that  are 
robust  in  the  presence  of  random  noise.  The  next  theorem  provides  a  partial  answer  to  these 
questions. 

Theorem  6.  IfZ  is  a  kernel for  (X,Y),  then  Z<X. 

If  Z  =  (z1,z2,---,z^)  is  a  kernel  for  (X,Y),  then  the  column  vectors  x^ of  Z  are  called  kernel 

vectors  for  (X,Y).  According  to  Theorem  6,  z1  <  x*  for  %  =  1,  2,  k.  Thus,  the  kernel  vectors 
represent  eroded  versions  of  the  patterns  x1, ...,  xk. 

Example  4.  The  set 
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provides  another  example  of  a  set  of  kernel  vectors  for  the  vector  pairs  given  in  Example  1 .  This 
shows  that  kernel  vectors  are  not  unique. 

The  definition  of  kernel  vectors  does  not  preclude  that  z>  =  x^  for  some  £  or  even  that  Z=X. 
of  course,  if  Z-  X,  then  nothing  will  have  been  gained  in  the  quest  for  creating  morphological 
associative  memories  that  are  robust  in  the  presence  of  random  noise.  For  the  same  reason,  cases 
where  z^  =  x^  for  some  £  need  to  be  avoided.  This  leads  to  the  notion  of  proper  kernels.  If  Z  is 

a  kernel  for  (X,Y)  with  the  property  that  z^  *  x^  ,  then  Z  is  called  a  proper  kernel  for  (X,Y). 
Note  that  the  two  kernels  in  Example  4  are  not  proper  kernels  since  for  both  kernels,  we  have  £ 

=  x2.  However,  the  notion  of  a  proper  kernel  does  not  imply  that  z^  <  x^  V£,  It  only  implies 

that  for  each  %  =  1,  2,  ...,  k,  there  exists  an  index  €  {1, •••,«}  such  that  z|  *  x}  .  In  other 

words,  each  kernel  vector  zs  has  at  least  one  coordinate  that  differs  from  the  corresponding 
coordinate  of  x\  According  to  Theorem  6,  this  coordinate  must  be  strictly  less  than  the 
corresponding  coordinate  of  x\ 

If  zY  <xY  <xY,  then 

zY  =  Mzz  □  zY  <  Mzz  □  xy  <  Mzz  □  xY  =  zY  (5.57) 

and,  hence,  Mzz  □  xY  =  zY.  Thus,  if  Z  is  a  kernel  for  (X,Y)  and  zY  <  xY  <xY,  then  we  are 
guaranteed  that 

WaU  (MzzH  r)-y*.  (5.58) 

Therefore,  an  eroded  version  xY  of  xY  satisfying  the  inequality  zY  <xY  will  be  correctly 
associated  with  the  pattern  y Y .  Since  Mzz  is  very  robust  in  the  presence  of  dilative  noise,  it  is 
not  unreasonable  to  expect  that  a  distorted  version  xY  of  xY  containing  both  dilative  and  erosive 

noise  while  satisfying  the  inequality  zY  <xY  will  be  correctly  associated  with  the  pattern  yY. 
This  expectation  turns  out  to  be  true  and  the  next  theorem  provides  conditions  that  guarantee 
perfect  recall  for  corrupted  input  patterns  xY. 
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Theorem  7.  If  Z  is  a  kernel  for  (. X,Y)  and  xY  denotes  a  distorted  version  of  xY  such  that 
zj  <Xj  y/  =  l,---,n  and  zj  =  xj  whenever  mzz(i,j)  =  zj  - z  J ,  then  □  xY  =  zY. 

In  this  theorem,  mzzkij)  denotes  the  ifh  entry  of  the  matrix  Mzz-  The  theorem  also  shows 
that  any  corrupted  version  xY  of  xY  which  contains  both  dilative  and  erosive  noise  but  satisfies 
the  hypothesis  will  be  correctly  associated  with  the  pattern  yY  when  fed  into  the  system 
{ input  — >  M zz  — >  WZy  — ^  output } . 


Figure  5. 7:  An  example  of  kernel  images.  The  kernel  image  corresponding  to  a 
particular  letter  image  is  the  image  directly  below  the  letter  image. 


Theorems  6  and  7  can  be  viewed  as  a  guide  for  selecting  kernel  vectors.  A  first  step  is  to 
erode  the  patterns  x^ ,  saving  only  a  few  unique  pattern  values  in  order  to  construct  a  kernel 

vector  such  that  axy  =0  whenever  This  may  not  always  be  possible.  However, 

when  it  is  possible,  it  provides  unique  identification  markers  for  the  memory  MZz  and  the 
conditions  stated  in  Definition  2  are  automatically  satisfied.  In  general,  however,  such  unique 
features  may  not  exist  for  some  index  %.  In  this  case,  once  an  initial  set  of  kernel  sector 
candidates  has  been  selected,  the  two  conditions  of  Definition  2  need  to  be  checked  in  order  to 
ensure  correct  recall. 
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We  employed  this  method  for  obtaining  kernel  vectors  x^  and  corresponding  weight 
matrices  Mzz  and  Wzx  for  the  ten  patterns  shown  in  Figure  5.3.  The  corresponding  kernel  images 
are  shown  in  Figure  5.7.  The  autoassociative  memory  {input  — » M zz  — *  ^ZY  output) 
obtained  from  this  set  of  kernel  vectors  proved  to  be  extremely  roust,  recognizing  all  patterns 
even  after  each  pattern  was  corrupted  by  randomly  reversing  each  bit  with  a  probability  of  0.15. 
Figures  5.8  and  5.9  represent  a  sampling  of  our  inputs  to  the  net  and  the  corresponding  outputs. 


A 

Figure  5.8.  An  example  of  the  behavior  of  the  memory  {input  — >  — >  W^y  — >  output}, 

which  was  trained  using  the  ten  exemplars  shown  in  Figure  5.3.  Presenting  the  memory 
with  the  corrupted  patterns  of  the  letters  A,  B,  and  X  resulted  in  perfect  recall  (lower  row). 
Each  letter  was  corrupted  by  randomly  reversing  each  bit  with  a  probability  of  0.15. 

An  additional  benefit  of  using  kernel  patterns  is  the  increase  in  storage  capacity  for 
associations  QC,  Y).  This  is  due  to  the  fact  that  Mzz  I  3  perfect  recall  memory  for  any  finite 

number  of  pattern  vectors  and  because  of  the  sparseness  of  eroded  patterns  (i.e.,  in  the 
Boolean  case  most  pattern  values  will  be  zero),  the  storage  capacity  of  WZY  can  be  greatly 
increased. 
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Figure  5.9.  Another  example  of  the  behavior  of  the  memory 
{input  — >  M zz  Wzy  output } .  Presenting  the  memory  with  the  corrupted 
patterns  of  the  letters  c.  E,  and  e  resulted  in  perfect  recall  (lower  row).  Again, 
each  letter  was  corrupted  by  randomly  reversing  each  bit  with  a  probability  of 
0.15. 

5.4.  Comparison  with  Previous  Work 

In  recent  years,  lattice-based  matrix  operations  have  found  widespread  applications  in  the 
engineering  sciences.  In  these  applications,  the  usual  matrix  operations  of  addition  and 
multiplication  are  replaced  by  corresponding  lattice  operations.  Lattice  induced  matrix 
operations  lead  to  an  entirely  different  perspective  of  a  class  of  nonlinear  transformations.  These 
ideas  were  applied  by  Shimbel  [Shi54]  to  communications  networks,  and  to  machine  scheduling 
by  Cuninghame-Green  [Cun60,62]  and  Giffler  [Gif60].  Others  have  discussed  their  usefulness 
in  applications  featuring  shortest-path  problems  in  graphs  [Bac75,Ben68,Car71,Pet67]. 
Additional  examples  are  given  in  [Cun79],  primarily  in  the  field  of  operations  research. 

The  concept  of  morphological  networks  grew  out  of  the  theory  of  image  algebra 
[Rit01,RitMS.Rit91,Rit90a].  It  was  shown  that  a  subalgebra  of  image  algebra  includes  the 
mathematical  formulations  of  currently  popular  neural  network  models  [Rit89,Rit91]  and  first 
attempts  in  formulating  useful  morphological  neural  networks  appeared  in  [Rit90b]  and  [Dav90]. 
Applications  to  image  processing  were  first  developed  by  Ritter  and  Davidson  [Rit90b,Dav89]. 
Davidson  employed  morphological  neural  networks  to  solve  template  identification  and  target 
classification  problems  [Dav92b,93a-c].  C.P.  Suarez-Araujo  applied  morphological  neural  nets 
to  compute  homothetic  auditory  and  visual  invariances  [Sua92,97].  Another  interesting  network 
consisting  of  a  morphological  net  and  a  classical  feedforward  network  used  for  feature  extraction 
and  classification  was  designed  by  Paul  Gader  and  his  colleagues  [Gad94,Won95],  These 
researchers  devised  multilayer  morphological  neural  nets  for  very  specialized  applications.  A 
more  comprehensive  and  rigorous  basis  for  computing  with  MNNs  appeared  in  [Rit96]  where  it 
was  shown  that  morphological  neural  nets  are  capable  of  solving  any  conventional  computational 
problem.  Ritter,  Sussner,  and  Diaz-de-Leon  [Rit98]  investigated  morphological  associative 
memories,  a  class  of  networks  not  previously  discussed  in  detail. 
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An  entirely  different  model  of  a  morphological  network  was  presented  in  [Wil89].  This 
particular  model  employs  the  customary  operations  of  multiplication  and  summation  at  each 
node,  which  is  fundamentally  different  from  the  models  described  herein. 

5.5.  Applications  and  Future  Work 

We  have  developed  a  new  framework  of  neural  network  computing  based  on  lattice  algebra. 
The  resulting  morphological  neural  networks  are  radically  different  in  behavior  than  traditional 
artificial  neural  network  models.  The  main  emphasis  of  this  paper  was  on  morphological 
associative  memories.  In  contrast  to  traditional  associative  memories,  morphological  associative 
memories  converge  in  one  step!  Thus,  convergence  problems  do  not  exist.  Morphological 
analogues  to  the  Hopfield  net  not  only  proved  to  be  far  more  robust  in  the  presence  of  noise,  but 
also  have  unlimited  storage  capacity  for  perfect  inputs.  For  noisy  inputs  and  carefully  chosen 
kernels,  morphological  autoassociative  memories  again  exhibit  superior  performance  in  both 
recall  and  storage.  In  our  experiment,  the  autoassociative  morphological  memory 
{input  ^  M 77  — >  Wzy  output)  was  capable  of  recalling  62  grid  patterns  defined  on  grids  of 
size  21x21  pixels.  These  patterns  represent  the  entire  English  alphabet  of  upper-  and  lower-case 
letters  together  with  integer  digits  ranging  from  0  to  9.  Corruption  of  these  patterns  by  randomly 
reversing  each  bit  with  a  probability  of  0.08  and  presenting  these  corrupted  patterns  to  the 
memory  also  resulted  in  perfect  recall. 

It  should  be  emphasized  that  the  results  presented  herein  represent  only  a  first  step  in  the 
exploration  of  morphological  neural  networks.  The  lattice  algebraic  approach  to  neural  network 
theory  is  new  and  a  multitude  of  open  questions  await  exploration.  For  example,  we  noted  that 
the  selection  of  kernel  patterns  merits  further  investigation.  This  is  especially  true  when  using 
gray- valued  patterns.  Selection  of  proper  sets  of  kernel  vectors  for  Boolean  patterns  is  usually 
easy,  while  the  selection  of  proper  sets  for  gray- valued  kernel  patterns  can  be  difficult.  To  obtain 
a  better  understanding  of  this  difficulty,  we  challenge  the  reader  to  find  a  proper  set  of  kernel 
vectors  for  the  set  of  vector  pairs  defined  in  Example  1.  Of  course,  this  represents  only  a  small 
part  of  the  overall  challenge.  Learning  rules  for  both  associative  morphological  memories  and 
morphological  perceptrons  are  under  development,  but  need  to  be  further  investigated.  The  base 
of  applications  needs  to  be  expanded  and  further  compared  to  traditional  methods.  Dr.  Ritter  and 
his  students  are  currently  investigating  the  use  of  single  neurons  for  computation,  including 
applications  in  target  recognition.  Early  results  have  shown  that  morphological  models  for  single 
neurons  with  dendritic  computation  can  be  employed  in  target  classification  (e.g.,  landmine 
detection  applications)  with  very  high  probability  of  detection  and  low  false  alarm  rate. 

In  a  larger  sense,  MNN  models  are  connected  with  the  fundamental  question  concerning  the 
difference  between  biological  neural  networks  and  artificial  neural  networks,  namely. 

Is  the  strength  of  the  electric  potential  of  a  signal  traveling  along  an  axon  the  result  of  a 
multiplicative  process,  and  does  the  mechanism  of  a  neuron ’s  postsynaptic  membrane 
add  the  various  potentials  of  electrical  impulses,  or  is  the  strength  of  the  electric 
potential  an  additive  process  and  does  the  postsynaptic  membrane  only  accept  signals  of 
a  certain  maximum  strength? 

A  positive  answer  to  the  latter  query  would  provide  a  strong  biological  basis  for  morphological 
neural  networks.  In  Section  6,  we  discuss  the  application  of  MNNs  to  processing  of  the  TNE 
agreement  map. 
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6.  Combination  of  MNNs  and  TNE  in  Classify-Before -Detect  ATR 

In  this  STTR,  we  established  two  distinct  approaches  to  target  classification  using 
morphological  neural  networks  to  refine  classifier  output.  Section  6.1  describes  how  MNNs  can 
be  employed  in  reconfiguration  of  the  TNE  agreement  map  (AM).  Section  6.2  describes  the 
application  of  Dr.  Ritter’s  current  research  in  feedforward  MNNs  and  dendritic  computation,  and 
how  these  techniques  can  be  used  to  enhance  AM  processing. 


6.1.  AM  Processing  with  MNNs 

Consider  a  signal  vector  Vs®,  which  denotes  output  of  a  sensor  s  at  time  t,  which  is  of  form 


AO 


=  (vl 


s(t)  AO 
>v2  ’ 


•>v; 


s(t) 


vs(0 

•5  vn 


)T. 


(6.1) 


where  T  denotes  transpose  and  e  N ,  with  N  denoting  the  set  of  positive  integers.  For  each 
1=1,...,  n,  the  Ith  component  vf®  of  \s^  references  a  particular  row  of  the  TNE  binary 

pointer  table  entry  corresponding  to  the  value  v/^ .  The  AM  table  corresponding  to  vs^  is  a 
new  table  consisting  of  n  binary  rows,  where  the  i*h  row  consists  of  the  TNE  binary  pointer  table 
entry  corresponding  to  Vj.  Figure  6.1  illustrates  the  construction  of  the  AM  for  v5^.  Let 

w denote  the  AM  for  .  Then,  w5^  can  be  viewed  as  a  vector,  where  ^  denotes 

the  i*h  element  of  this  vector,  which  is  the  Ith  row  of  .  Alternatively,  can  be 

expressed  as  the  integer  value  that  corresponds  to  a  binary  representation  of  the  i*h  row  of  the 
AM. 


TNE  Binary  Pointer  Table  TNE  Agreement  Map  for  ys^ 


Figure  6.1.  Schematic  diagram  of  how  the  TNE  agreement  map  for  vs^  is 
obtained from  the  TNE  binary  pointer  table. 

Suppose  an  ATR  classification  problem  has  k  target  classes.  Then,  it  is  possible  to  train  a 
morphological  perceptron  using  a  training  set  fw^ , c^ ) : £  =  1 , . . . .w},  where  c^e  l,....k  and 

belongs  to  class  j  e  {l, ...,£}  if  and  only  if  c^=  j .  As  shown  in  [Rit98],  training  is  very 

fast  and  the  training  set  will  always  be  correctly  classified.  After  we  do  this  training  for  each 
sensor  s  =  1,  ...,  M,  the  morphological  perceptron  will  assign  each  test  vector  obtained  from 
actual  observation  to  a  particular  target  class.  For  a  suite  of  M  sensors,  the  perceptron 
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classification  produces  an  M  x  K  target  classification  table,  whose  column  sums  will  yield  the 
most  likely  target.  More  precisely,  if 

m 

k;  =  I xt  > k{  (6.2) 

i=l  7 

for  every  /=  1,  k,  l*  j ,  then  die  target  belongs  to  class  j.  Here,  xt.  =1  if  w5^  has  been 

classified  by  the  morphological  perceptron  as  belonging  to  class  j,  else  xt  =  0 .  Figure  6.2 
illustrates  this  process. 


w'  Target  Class 


Figure  6.2.  AM  assignment  to  a  target  class  by  the  morphological  perceptron  technique. 


6.2.  Dendritic  Computation  for  Processing  Significant  AM  Information 

The  schematic  diagram  of  the  processing  sequence  shown  in  Figure  6.3  illustrates  the  simple 
scheme  of  summing  across  components  on  an  AM  and/or  across  (over  time)  multiple  AMs. 
Obviously,  summing  across  components  may  help  identity  a  possible  target,  but  often  provides 
only  partial  and  sometimes  conflicting  information  as  it  does  not  account  for  contiguous  values 
of  ones.  For  example,  given  a  vector  x  =  (jq,  x2, ...,  *io)  obtained  fom  a  row  of  a  TNE  AM, 

where  xi  =  X4  =  xs  —  xio  ==  1  and  x,-  =  0  for  i  -£  1,4,8,10 ,  then  XjEi xi =  4-  However,  it  may  be 
statistically  more  significant  to  know  if  the  value  4  thus  obtained  represents  contiguous  ones. 
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Figure  6.3:  Schematic  diagram  of  AM  processing  using  component  summation:  We  are  free  to 
operation  on  the  3D  binary  AM  (a)  along  exemplar  components  in  single  or  multiple 
agreement  maps,  or  (b)  by  component(s)  across  multiple  agreement  maps.  Thus,  MNNs 
can  access  the  agreement  map  space  in  multiple  ways,  to  increase  versatility  and 
robustness,  while  at  the  same  time  (i)  computing  target  density  per  test  vector  by 
summing  across  components,  and  (ii)  fusing  multiple  inputs  expressed  as  multiple 
agreement  maps,  by  summing  along  each  component. 


To  obtain  this  type  of  information  from  a  collection  of  AMs,  we  can  employ  a  simple 
feedforward  morphological  neural  network  that  needs  no  training.  For  the  above  example  with 

x  =  (1,0, 0,1,  0,0, 0,1, 0,1) 
the  output  of  our  feedforward  MNN  would  be 

y  =  (1,0, 0,1, 0,0, 0,1, 0,1). 


However,  if 


x  =  (0,1, 1,1, 1,0,  0,0,  1,0) 

the  output  of  the  MNN  would  then  be 

y  =  (0,4, 1,1, 1,0,  0,0, 1,0). 
In  the  latter  case,  we  obtain  two  types  of  information: 


1  •  SjEj  Xj  =  5,  which  tells  us  that  there  are  five  positive  responses,  and 

2.  =  4,  which  indicates  that  among  the  five  positive  responses,  four  are  contiguous. 

This  type  of  feedforward  network  with  both  excitatory  and  inhibitory  neurons  has  been  described 
in  [Rit02].  To  provide  a  partial  example  and  illuminate  the  concept  underlying  the  network,  we 
consider  a  simple  case,  where  x  =  (x\,  xi,  x{).  Figure  6.4  schematically  illustrates  this  network 
and  provides  all  the  axonal  branches  terminating  on  output  neuron  y\ .  The  inhibitory  neurons  are 
denoted  by  z/.  The  activation  function  for  the  inhibitory  neurons  is  defined  by 


/(*)  = 


0 

oo 


if  z  >  0 
if  z  <  0 


(6.3) 
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Figure  6.4.  Simple  example  of  AM  processing  network  for  three-input  test  vector  x  = 

(xi,  X2,  x$),  showing  the  axonal  branches  and  dendritic  tree  for  output  neuron  yi  only. 

The  neurons  labeled  zi  and  Z2  are  inhibitory  neurons.  The  dashed  lines  are  intended 
only  to  distinguish  axons  associated  with  X2  and  %  from  those  associated  with  other 
neurons. 

The  input  neurons  have  no  activation  function,  but  simply  send  their  values  down  their  axons  and 
subordinate  axonal  branches.  The  output  neurons  also  have  no  activation  functions  as  we  want 
to  read  the  total  computed  value.  The  computation  of  the  /h  output  neuron  is  given  by  the 
following  formula: 

DJ  , 

yj  =  V  (6.4) 

A: — 1 


where  xJk  (x)  denotes  the  computed  value  of  the  kA  dendrite  of  output  neuron  j  and  Dj  denotes 
the  total  number  of  dendrites  of  output  neuron  j.  The  value  of  xk  (x)  is  given  by 

■'fw  t  -fo+»s*).  (6.5) 

iel(k)  ze  J(k) 

where  x,-  denotes  the  value  of  the  input  neuron, 

zt  denotes  the  value  of  the  ith  inhibitory  neuron, 

I(k)  denotes  the  set  of  input  neurons  having  terminal  branches  on  the  kA  dendrite  of 
output  neuron  j, 

J(k)  denotes  the  set  of  inhibitory  neurons  having  terminal  branches  on  the  it*11  dendrite  of 
output  neuron  j, 
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Wijk  denotes  the  axonal  branch  weight  of  the  Ith  input  neuron  terminating  on  the  ft * 
dendrite  of  output  neuron  j,  and 

Vyk  denotes  the  axonal  branch  weight  of  the  fh  inhibitory  neuron  terminating  on  the  ft* 
dendrite  of  output  neuron  j. 

Thus,  to  compute  the  value  of  x  =  (x\,x2,  X3)  =  (1, 1, 0),  we  have 
3 

yi=Vx\  (x)  =  x\  (x)  v  x\  (x)  V  T3  (x) 
ft=l 

-  (1  +  0)  v  [(1+1)  a -(0-2)  a -(0-2)]  v  [(0  +  2)  a -(0-3)  a -(0-3)] 

=  2 

Similarly  for  x  =  (1,  1,  1)  we  obtain  yx  =  3,  while  for  x  =  (1,  0,  jc3)  and  x  =  (0,  x2,  x3)  we 
respectively  obtain  y  1  =  1  and  y  \  =  0,  where  x2  and  x3  can  have  the  value  0  or  1 . 

6.3.  Implementational  Advantages  of  AM  Processing  with  MNNs 

Due  to  the  potential  size  of  the  agreement  map,  processing  of  the  AM  can  be  potentially 
burdensome.  We  have  elsewhere  shown  that  it  is  possible  to  process  the  TNE  agreement  map 
with  Boolean  operations  implemented  in  a  parallel  or  reconfigurable  processor  [Key99]  or 
network  of  processors.  However,  MNNs  present  the  advantage  of  a  fundamental  increase  in 
processing  efficiency  for  performing  contiguity  detection  simultaneously  with  target 
classification  in  O(log  n)  time.  This  concurrency  cannot  be  realized  in  traditional  vector 
processing  techniques  without  prohibitive  replication  of  hardware.  Analysis  and  several 
examples  in  support  of  these  claims  follow. 

As  before,  assume  that  we  have  n  vector  components,  m  sensors,  and  ft  targets  possible.  The 
agreement  map  shown  schematically  in  Figure  6.5  thus  has  kmn  elements,  each  of  which  is 
shown  I  that  particular  example  as  one  bit.  Thus,  any  operation  that  processes  all  locations  in  the 
agreement  map  at  least  once  (as  exemplified  in  Figure  3  by  the  bold  arrows  indicating 
summation)  will  incur  work  W  =  O (kmn).  Since  ft  can  be  quite  large  (e.g.,  103  to  106  in 
practice),  and  m  is  typically  much  smaller  (e.g.,  1  to  100),  the  efficiency  of  TNE’s  AM 
processing  algorithm(s)  depend(s)  on  making  n  as  small  as  possible  without  losing  source 
resolution  by  introducing  excessive  sampling  error  [Key99].  Assuming  that  16  <n<  256  is 
usual,  we  have  1.6  x  104  <  kmn  <  2.56  x  1010,  a  range  of  over  two  orders  of  magnitude. 

However,  as  discussed  in  Section  5,  MNNs  can  process  input  in  one  pass  of  the  network. 
However,  in  the  example  given  in  Section  6.2,  Equation  (6.5)  requires  n2  partial  minima  to  be 
computed.  Fortunately,  this  can  be  computed  on  a  tree- structured  architecture,  leading  to  O(log 
n2)  =  0(log  n)  time  complexity  and  0(n)  space  complexity,  since  the  processing  elements  from 
stage  i  can  be  recycled  in  the  computation  of  stage  (1  +  1)  if  double  buffering  is  employed.  This 
concept  is  illustrated  notionally  in  Figure  6.5. 

In  particular,  the  number  of  processing  operations  required  to  compute  Equation  (6.4),  shown 
in  the  upper  part  of  Figure  5.5a  for  n  =  4,  and  Equation  (6.5),  is  given  for  each  yj  by 

W (ft)  =  3  +  [/(ft)  -  1 1  + 1 /(ft)  -  1 1  comparison  operations  +  |/(ft)|  +  |/(ft)|  additions. 

This  implies  that  a  tree- structured  architecture  could  compute  Equation  (6.4)  in  O(log|/(ft)|  + 
logp(ft)|)  addition  operations,  since  comparison  is  typically  implemented  with  subtraction,  which 
is  implemented  in  terms  of  addition. 
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Each  leaf  is  an  instance  of  the  term 
Tj[(x) ,  given  in  Equation  (6.5),  as 
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Figure  6.5:  Computation  of  MNN  result  with  a  parallel  tree-structured  architecture:  (a) 
diagram  of  partial  minima  and  global  maximum  computation  for  n  =  8,  (b)  double 
buffering  used  to  recycle  processors  for  multi-stage  hierarchical  pipelining  of  MNN 
computation.  Observe  that,  in  a  tree-structured  architecture,  processor  Pn  could  be 
used  in  Stage  i  but  not  in  Stage  i  -  1. 
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7.0  Conclusions  and  Future  Work 

Research  at  FTI  and  UF  on  Phase-I  of  the  GASP  project  focused  on  establishing  the 
theoretical  and  practical  basis  for  MNN-based  processing  of  multiple  TNE  agreement  maps 
Emerging  technologies  such  as  Morphological  Neural  Nets  and  Dendritic  Computing  with 
MNNs  were  integrated  into  the  research  results.  All  tasks  in  the  SOW  were  addressed,  as  shown 
in  Section  1.  In  this  section,  we  summarize  our  Phase-I  findings,  then  discuss  future  work  for  a 
possible  Phase-H  research  and  development  effort. 

7.1.  Research  Findings 

The  combination  of  disparate  classifier  outputs  is  fraught  with  difficulty,  primarily  due  to 
differing  sensor  parameters  (e.g.,  a  spectral  detection  process  versus  a  time-of-flight  based 
detection)  and  data  quality  (e.g.,  high  computational  error  corrupting  one  classifier  output,  while 
another  classifier  is  sensitive  to  image  detector  noise  propagated  through  one  or  more  of  the 


69 


STTR  Army02-T004  Topic  Title:  Analysis  and  Characterization  of  Pattern  Classifiers; 

GASP  -  Generator  for  Adaptive  Statistical  Pattern  Recognition  Systems  January  31 ,  2003 

aforementioned  postprocessing  computations).  Various  techniques  such  as  linear  or  nonlinear 
combination,  Bayesian  or  unsupervised  classification,  and  supervised  learning  techniques  such  as 
neural  networks  have  been  proposed  for  refining  the  output  of  individual  classifiers  or  combining 
such  outputs  to  produce  a  more  accurate  classification. 

Accordingly,  we  have  investigated  Bayesian  classifiers  (Section  3),  Frontier  Technology’s 
TNE  pattern  recognition  paradigm  (Section  4),  and  UF’s  Morphological  Neural  Networks 
(MNNs)  pattern  recognition  paradigm  (Section  5)  for  both  classifier  and  post-classifier 
processing,  as  described  in  Section  2.  Furthermore,  we  have  investigated  and  formulated 
approaches  for  processing  the  TNE  agreement  maps  using  MNNs  (Section  6).  The  latter 
techniques  represent  potentially  important  new  developments  in  pattern  recognition,  which 
promise  greater  accuracy  and  information  storage  capacity,  as  well  as  improved  refinement  of 
classified  sensor  output. 

We  have  found  that  nonadaptive  Bayesian  classifiers,  especially  in  a  fixed,  rule-based  pattern 
recognition  paradigm,  lack  sufficient  flexibility  to  track  input  changes  (e.g.,  nonstationarities)  in 
a  manner  that  preserves  or  increases  Pd  and  minimizes  or  decreases  Rfa.  Several  additional 
problems  with  classical  Bayesian  pattern  recognition  theory  include  the  unrealistic  requirement 
that  all  probabilities  be  known  and  accurately  computed  prior  to  classification.  This  stipulation 
can  only  be  satisfied  with  full  model-based  coverags  of  die  input  pattern  space,  which  has  been 
frequently  shown  to  be  infeasible  in  ATR  field  practice.  It  is  well  known  that  model-based  ATR 
often  leads  to  brittle  classifiers  that  incompletely  cover  input  space  because  they  are  designed  to 
account  for  known  instances  of  input  patterns.  However,  the  interesting  phenomena  in  ATR 
field  practice  are  usually  those  of  which  one  has  little  or  no  foreknowledge.  It  is  these 
unforeseen  effects  that  typically  render  a  target  classifier  brittle  or  failed  in  practice. 

An  alternative  to  pure  Bayesian  classifiers  is  FTI’s  TNE  pattern  recognition  paradigm,  which 
is  grounded  in  Bayesian  pattern  recognition  theory  at  a  high  level,  but  allows  for  uncertainty  in 
the  pattern  space,  noise  or  partial  information  in  test  vector  input,  and  can  combine  classifier 
results  in  a  data  fusion  paradigm.  Another  alternative  is  neural  network  based  pattern 
classification,  which  can  be  rendered  adaptive  via  a  learning  algorithm.  Dr.  Ritter  has  developed 
Morphological  Neural  Nets  (MNNs)  as  highly  accurate,  efficient  classifiers  that  are  designed  to 
replace  classical  neural  networks  in  a  wide  variety  of  signal  and  image  processing  applications. 
MNNs  have  the  advantages  of  theoretical  maximum  information  storage  capacity,  fast  and 
accurate  convergence  (usually  in  one  pass  of  the  net),  and  short  training  times.  Additionally, 
theory  that  predicts  the  performance  of  MNNs  under  partial  information  or  noise  has  been 
published  in  the  literature  and  summarized  in  this  report. 

In  this  Phase-I  research  effort,  we  examined  how  MNNs  could  be  applied  to  processing  of 
the  TNE  agreement  map  to  further  increase  classification  accuracy.  If  one  views  TNE’s  target 
database,  distance  function,  and  thresholding  parameter(s)  as  a  priori  knowledge,  then  one  can 
think  of  the  columns  of  TNE’s  agreement  map  as  the  partial  classifier  results.  We  have  shown 
how  MNNs  could  be  applied  to  these  columnar  partitions  to  (a)  render  partial  classification  more 
robust,  (b)  combine  results  from  adjacent  columns  to  increase  the  likelihood  of  accurate  target 
classification  in  the  classify-before- detect  ATR  paradigm,  and  (c)  direct  further  statistical 
processing  or  optimization  of  MNN/TNE  results. 

In  Section  4,  Section  5,  and  Appendix  A,  we  discuss  two  advanced  technical  issues,  namely 
(1)  the  use  of  array  processors  to  compute  the  TNE  or  MNN  algorithms,  and  (2)  the  use  of 
pointwise  versus  componentwise  performance  measures  of  Pd  and  Rfa  to  trigger  MNN 
processing  of  the  agreement  map.  Since  TNE  and  MNNs  are  based  on  matrix  operations  (e.g., 
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vector  correlation  and  pointwise  bitmap  operations  in  TNE,  with  nonlinear  vector- matrix 
products  in  MNNs),  it  is  possible  to  parallelize  these  operations  for  computation  on  embedded 
SIMD  array  processors  or  reconfigurable  computers  such  as  field  programmable  gate  arrays 
(FPGAs).  We  showed  that,  for  a  target  database  comprised  of  M  exemplars,  classification  of  N 
samples  of  K  pixels  each  could  be  computed  in  W  =  0(KMN)  work.  In  practice,  since  the  bits 
that  comprise  the  agreement  map  can  be  operated  upon  in  parallel  fashion  (e.g.,  Boolean  vector 
processing),  it  is  possible  for  W  to  have  a  small  proportionality  constant.  We  also  showed  that 
MNNs  could  compute  in  0(N(logK  +  logM))  time,  given  sufficient  parallelism  and  I/O 
bandwidth. 

7.2.  Future  Work 

Because  of  our  concerns  pertaining  to  classifier  performance  in  the  presence  of  nonstationary 
input,  we  propose  follow-on  research  (e.g.,  Phase-2  of  the  current  STTR)  to  develop  adaptive 
pattern  recognition  capabilities  for  TNE  that  would  be  directed  by  classification  performance 
results.  We  propose  to  achieve  this  objective  via  the  following  innovations: 

1.  MNN-directed  TNE  target  database  optimization  -  It  is  well  known  that  template-based 
pattern  recognition  systems  perform  more  efficiently  given  a  smaller  number  of  target 
templates.  Consider  the  problem  of  detecting  multiple  rotated  versions  of  the  same  target. 
Although  it  may  be  difficult  to  analytically  determine  the  optimal  rotation  angle  quantization 
level,  this  can  be  done  empirically  by  (a)  merging  similar  target  templates,  then  (b) 
constructing  a  Receiver  Operating  Characteristic  (ROC)  curve  from  Pd  and  Rfa  analysis  of 
TNE  classifier  output  obtained  from  ATR  imagery.  When  Pd  and  Rfa  begin  to  underperform 
with  respect  to  specifications,  then  it  can  be  argued  that  the  optimal  quantization  level  has 
been  exceeded.  However,  this  tells  us  nothing  about  the  underlying  merging  process  inherent 
in  target  template  combination,  and  its  effect  on  TNE  classifier  performance.  Neither  does 
this  assumption  consider  cosine  or  tangent  projection  laws  that  cause  adjacent  views  in  out- 
of-plane  rotations  near  45  degrees  to  appear  significantly  different  visually,  as  opposed  to 
out-of-plane  rotations  near  zero  degrees,  which  appear  almost  identical.  Apparently,  a 
nonlinear  rotation  angle  quantization  scheme  would  be  needed  to  compensate  for  such 
projection  effects. 

Proposed  Work.  In  Phase-II,  we  propose  to  investigate  the  concept  of  adaptive  quantization 
using  MNNs  to  cluster  rotated  targets  according  to  the  expected  viewpoint.  This  approach, 
which  we  call  viewer-adaptive  training  set  optimization,  would  allow  the  inter- view  spacing 
(e.g.,  angular  quantization  step  size)  to  be  varied  as  a  function  of  out-of-plane  rotation  angle 
as  well  as  performance  variables  such  as  Pd  or  Rfa.  Thus,  we  would  more  closely  approach 
optimization  of  the  target  database  from  a  physical  (i.e.,  viewer-dependent)  or  mathematical 
(e.g.,  ATR  filter  sensitivity)  perspective.  Additionally,  we  propose  to  explore  the  effect  of 
noise  and  contrast  degradation  (e.g.,  effects  of  intensified  imaging  in  night  viewing 
applications)  on  angular  quantization  levels,  again  from  a  performance  perspective  grounded 
in  Pd  and  Rfa  based  metrics. 

2.  Performance  directed  target  database  and  AM  processing  optimization  -  As  noted  in  Item 
1),  above,  when  Pd  and  Rfa  no  longer  meet  performance  specifications,  one  should  be  able  to 
employ  automatic  re-optimization  of  the  TNE  template  database,  as  well  as  apply  MNN- 
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assisted  TNE  agreement  map  processing,  as  investigated  in  Phase- 1 .  For  example,  if  a  small 
angular  quantization  interval  is  selected  for  rotated  targets,  this  might  be  increased  for  night 
viewing,  due  to  the  effects  of  image  intensifier  noise  on  (a)  image  resolution  and  (b)  resultant 
TNE  correlation  performance  between  a  test  vector  and  each  pattern  in  the  template  database. 
We  note  that  there  are  many  other  parameters,  in  addition  to  angular  resolution,  that  can 
serve  as  bases  for  determining  whether  or  not  to  cluster  targets  in  the  TNE  database.  These 
should  also  be  investigated,  as  described  below. 

Proposed  Work.  In  Phase- II,  we  propose  to  investigate  (a)  parameters  that  could  be  used  to 
direct  template  clustering  in  the  TNE  database,  as  well  as  (b)  theory  and  algorithms  for 
relating  TNE  database  structure  to  TNE  performance  in  terms  of  Pd  and  Rfa.  Example 
parameters  include  but  are  not  limited  to  camera  noise,  measures  of  input  nonstationarity, 
target/background  contrast  and  target  rotation/scaling  effects.  Although  there  has  been  some 
investigation  of  these  effects  for  TNE-based  compression,  additional  research  is  required  to 
transfer  these  technology  developments  and  insights  to  the  less  constrained  application 
domain  of  target  recognition.  Additionally,  we  propose  to  investigate  how  MNNs  could  be 
used  to  cluster  TNE  templates  based  on  well-established  similarity  or  content  metrics 
published  in  the  literature. 

3.  MNN-assisted  TNE  classifier  refinement  —  Given  results  produced  in  the  Phase- 1 
investigation,  it  is  readily  apparent  that  MNNs  have  considerable  present  and  future  utility  in 
TNE  agreement  map  processing.  At  die  present  time,  MNNs  can  determine  saliency  of 
partial  classification  results  (e.g.,  columns  in  the  TNE  agreement  map)  or  can  be  configured 
to  operate  on  the  agreement  map  to  combine  columns  in  a  data  fusion  paradigm.  We  would 
also  note  that  it  is  possible  to  process  the  agreement  map  using  statistical  procedures  such  as 
Analysis  of  Variance  (ANOVA)  to  determine  diagonal  orientations  of  detections  within 
columns  of  the  agreement  map,  which  can  indicate  phase  shifts  among  similar  targets  or 
target  types.  We  thus  propose  the  following  work  items  for  a  possible  Phase-II  study. 
Proposed  Work.  In  Phase-II,  we  propose  to  extend  the  research  performed  in  Phase-I, 
wherein  we  investigated  the  feasibility  of  applying  MNNs  to  process  the  TNE  agreement 
map.  In  particular,  we  propose  to  investigate  techniques  for  interrogating  the  agreement  map 
while  employing  other  than  rowwise  or  columnwise  partitions.  In  Phase-I,  we  found  that 
interrogation  of  AM  rows,  or  spatially  clustered  groups  of  rows,  supported  preliminary 
selection  of  possible  targets  that  were  classified  and  detected  within  those  spatial  partitions. 
This  information  could  be  used  to  downselect  additional  target  templates  for  multi-pass 
recognition,  or  to  select  strategies  for  measuring  or  analyzing  Pd  or  Rfa  within  TNE 
sampling  partitions.  In  the  latter  case,  this  would  partly  support  sub-block  detection  (e.g.,  for 
small  features  or  targets)  under  Pd  and  Rfa  constraints  or  metrics  that  are  customized  for  the 
expected  target  types.  Additionally,  in  Phase- 1,  we  found  that  columnwise  processing  with 
MNNs  could  implement  data  fusion  through  combination  of  partial  classification  results 
specific  to  similar  target  instances  or  types. 

In  Phase-II,  we  propose  to  extend  these  research  results  to  search  for  other  types  of 
detection  patterns  within  an  agreement  map.  This  could  be  done  by  a  spatially  based 
ANOVA  to  detect,  for  example,  diagonal  patterns  indicative  of  phase  shifts.  As  an 
illustration  of  this  concept,  consider  the  situation  where  the  cannon  or  turret  of  a  tank  is 
detected  in  a  digital  image.  However,  this  turret  might  be  in  a  slightly  different  position  in 
different  tank  templates,  due  to  lack  of  template  co-registration  as  well  as  scale  or  rotation 
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differences.  Rather  than  scanning  the  AM  row-wise,  it  would  be  useful  to  detect  whether 
there  were  phase  shifts  within  a  group  of  content-related  templates  (e.g.,  similar  tank  types 
with  similar  out-of-plane  angular  rotations).  In  practice,  this  approach  would  be  useful  for 
refining  the  classification  result  to  achieve  more  accurate  target  identification. 

The  result  of  the  proposed  future  work  would  be  an  adaptive  and  more  highly  refined  target 
or  pattern  classification  paradigm.  In  this  envisioned  GASP  system,  which  we  outlined  in  the 
Phase- 1  proposal,  ATR  imagery  would  be  processed  by  TNE’s  sampling  module  and  applied  to 
the  TNE  engine  to  yield  an  agreement  map.  The  TNE  engine  would  employ  a  target  template 
database  that  would  be  optimized  as  described  previously,  to  adapt  to  changing  input  conditions 
according  to  Pd  and  Rfa  results.  MNNs  would  process  the  TNE  agreement  map,  as  described  in 
item  3),  above,  and  these  MNNs  would  also  be  constrained  by  Pd  and  Rfa.  If  time  and  resources 
permit,  it  could  be  useful  to  investigate  MNN-directed  adaptation  of  TNE  for  target  classification 
in  variable-noise  scenarios  typical  of  intensified  cameras  (e.g.,  night  vision  applications). 

The  proposed  research  would  produce  a  powerful  fusion  of  neural  nets  and  TNE  to  yield 
adaptive  classification  compensated  for  physical  effects  such  as  noise  that  manifest  in  the  pattern 
space.  This  represents  a  new  direction  in  pattern  recognition  with  potentially  high  payoffs  in  a 
wide  variety  of  military  and  domestic  applications. 


Appendix  A:  Advanced  Technology  Summary 

Several  technology  areas- of- need  have  been  highlighted  in  this  Phase- 1  study.  The  two  most 
prominent  areas  are  (1)  embedded  parallel  or  reconfigurable  processor  technology  to  support 
TNE  computation  (discussed  in  Section  B.l)  and  (2)  the  use  of  pointwise  versus  componentwise 
measures  of  Pd  or  Rfa  to  trigger  MNN-based  codebook  optimization  or  MNN-directed 
processing  of  the  TNE  agreement  map  (discussed  in  Section  B.2). 

B.l.  Embedded  Parallel  Computation  of  TNE 

This  section  contains  a  high-level  design  and  summary  analysis  for  an  ATR  architecture 
based  on  (a)  the  tabular  nearest-neighbor  (TNE)  pattern  recognition  paradigm,  (b)  one  or  more 
reconfigurable  processors,  and  (c)  a  scalable  multi-port  I/O  system.  In  particular,  the  proposed 
architecture  is  based  on  a  common  bus  that  is  designed  to  facilitate  switching  of  binary  input 
(two  operands)  to  selected  reconfigurable  computer  (RC)  units.  The  processing  units  are 
designed  to  be  reconfigurable  to  support  different  operations  over  die  agreement  map  (AM),  a 
data  structure  key  to  the  implementation  of  TNE.  Since  the  key  challenge  of  implementing  TNE 
on  RCs  is  the  I/O  overhead  associated  with  retrieval  of  the  agreement  map  rows,  followed  by 
column  operations  over  the  agreement  map,  we  first  emphasize  I/O  cost.  This  approach  is 
germane  due  to  low  computational  cost  -  in  practice,  addition  and  Boolean  logic  operations  only 
are  required.  Performance  estimates  presented  herein  are  highly  conservative,  and  would  be 
increased  considerably  by  more  advanced  parallel  or  reconfigurable  processing  technology  that 
will  likely  be  available  at  the  time  of  possible  prototype  construction. 

B.1.1.  Introduction  to  Parallel  Reconfigurable  Processor  Design  Issues:  Frontier 
Technology’s  TNE  pattern  recognition  paradigm  requires  retrieval  of  bit  vectors  from  memory 
containing  preprocessed  templates  (PTM).  Each  bit  vector  comprises  a  row  of  an  intermediate 
data  structure  called  an  agreement  map,  which  changes  with  each  position  of  a  source  image 
sampling  window.  In  this  context,  a  system  designer’s  first  concern  should  be  the  I/O  operations 
required  to  move  information  in  and  out  of  the  agreement  map.  A  second  design  issue,  which  is 
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no  less  important,  is  how  to  efficiertly  retrieve  AM  data  from  the  PTM  and  transfer  it  into  an  RC 
processor,  such  that  the  processor  can  operate  on  the  AM  data  efficiently.  Additional  design 
issues  include,  but  are  not  limited  to,  minimization  of  the  degree  of  parallelism,  degree  of 
memory  port  replication,  and  complexity  of  the  RC  processing  circuitry. 

B.l.1.1.  Assumption.  Given  (a)  K  pixels  per  sampling  window  or  block  applied  at  selected 
domain  points  of  an  N-pixel  source  image,  (b)  a  maximum  of  K  pixel  shifts  per  block,  and  (c)  K 
bit  vectors  retrieved  from  memory  comprised  of  L  bits  per  row  vector,  the  following  minimum 
I/O  work  is  required  for  AM  access,  during  processing  of  a  source  image  sequence  comprised  of 
F  frames  per  second  (fps)  with  R  replicates  per  frame: 

Wj/o  ^  KNLFR  bits/sec  (bps).  (B.l) 

Under  the  prespecified  constraints,  Wi/o  describes  the  approximate  number  of  I/O  operations 
required  to  move  the  AM  data  only.  Note  that  Wi/o  does  not  include  I/O  overhead  required  for 
RC  reconfiguration  or  for  moving  source  data  or  processed  output  (results)  into  buffer  memory. 
B.l.1.2.  Example.  If  K  =  256  pixels,  N  =  1M  pixel,  L  =  15,000  templates,  F  =  1  fps,  and  R  =  1, 
then  KNLFT  =  4.02  terabits  per  second  (Tb/s).  At  video  rates  (F  =  30fps),  this  product  increases 
to  120  Tb/s.  This  is  clearly  a  problematic  data  rate  for  architectures  based  on  relatively  slow  I/O 
bus  technology,  such  as  field-programmable  gate  arrays  (FPGAs). 

B.l.1.3.  Observation.  The  work  requirement  shown  in  Equation  (B.l)  can  be  ameliorated  by 
subsampling  each  (a)  source  block  (sampling  window)  at  a  fraction  £  of  K  pixel  shifts,  (b) 
image  at  a  fraction  £  of  N  possible  source  block  positions,  (c)  AM  at  a  fraction  £  of  its  L 
columns,  and  (d)  image  sequence  at  fractions  £  of  frame  rate  F  and  fr  of  R  frame  replicates.  The 
work  requirement  can  thus  be  expressed  as 

WI/Q  >  (fK  •  K)(fN  •  N)/fL  •  L)/fF  •  F)Tr  •  R)  bits/sec  (bps),  (B.2) 

which  can  be  rendered  tractable  to  current  memory,  bus,  and  RC  technology  if  the  factor 

f  =^K  fN  fL  ‘  *F  fR 

is  kept  sufficiently  small. 

B.l.1.4.  Example.  Given  the  values  K  =  256  pixels,  N  =  1M  pixels,  L  =  15,000  templates,  F  =  1 
fps,  and  R  =  1,  let  =  0.1  (e.g.,  10  percent  of  source  points  underlie  a  pixel  value  deemed  to  be 
part  of  an  interesting  candidate  target).  Further  let  lk  =  0.5  (i.e.,  half  of  the  pixels  in  the 
sampling  window  are  randomly  selected),  and  let  £  =  0.1  (i.e.,  there  exists  a  tenfold  reduction  in 
AM  size,  for  example  via  a  hierarchical  access  scheme).  Given  £,fR  =  1,  Wi/o  is  computed  from 
Equation  (B.2)  as: 

W„0  >  (0.5  •  256)(0.1  •  1M)(0.1  •  15,000)  bps  =  20.13  Gbps  . 

For  example,  this  estimate  of  I/O  might  be  feasible  using  high-bandwidth  multiport  I/O  and 
parallel  processing  technology,  as  discussed  below. 

B.l.1.5.  Observation.  Observation  B.l.1.3  implies  that  several  design  strategies  could  be 
concurrently  employed  to  mitigate  the  I/O  cost  in  the  presence  of  relatively  low  FPGA  clock 
speed  supported  by  current  technology  (e.g.,  200  MHz  to  500MHz).  A  common  remedy  for  this 
situation  is  I/O  parallelism.  In  a  practical  parallel  processing  scenario,  I/O  cost  could  be  shared 
among  P  processors,  each  of  which  could  have  M  I/O  ports.  In  that  case,  the  maximum 
reduction  in  I/O  cost  would  be  expressed  as  the  product  PM. 

Another  technique  that  appears  to  be  applicable  to  TNE  is  hierarchical  partitioning  of  the 
AM  domain,  in  particular,  the  column  coordinate.  For  example,  one  could  group  a  nonzero 
fraction  £  of  L  templates  specific  to  a  given  target  T  to  yield  LH  =fL  •  L  high-  level  templates. 
Each  of  these  high-level  templates  could  be  associated  with  a  lower- level  partition  of  template 
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space,  of  mean  size  L/Lh  templates.  The  best-match  score  at  a  high  level  would  point  to  the 
most  likely  partition  to  be  searched  at  a  lower  level.  This  strategy,  which  FTI  personnel  have 
indicated  is  feasible  in  practice,  would  reduce  Wi/o  by  an  average  amount  £. 

A  third  method  that  may  help  reduce  AM  I/O  cost  is  pipelining  of  AM  I/O,  which  could  be 
achieved  by  dividing  the  AM  into  n- column  partitions  of  Kn  pixels  each,  where  n  «  Q .  This 
could  facilitate  processing  of  the  AM  on  a  network  of  P  RCs,  each  having  storage  capacity  of  Kn 
bits.  Following  a  brief,  high-level  description  of  the  proposed  architecture,  each  of  the  preceding 
methods  is  subjected  to  summary  analysis. 

B.1.2.  Proposed  Prototype  Architecture.  Let  P  processors  denoted  by  Pi,  Pi, ...,  P?  having 
local  memories  Mi,  M2,  ...,  Mp  be  connected  by  a  common  bus  comprised  of  two  mbit  input 
buses  B\,  Bi  and  one  2m-bit  output  bus  2?3,  as  shown  in  Figure  B.l . 

B.l.2.1.  Data  Bus  and  Control.  Let  each  processor  have  input  (output)  switched  from  (to)  the 
common  bus  by  mpole  (2n>pole)  single-throw  normally-open  switches.  For  example,  if  Pt 
accepts  input  from  Bi  or  B2,  then  input  switches  Sj,i  or  Si, 2  would  be  cbsed,  allowing  data  to  flow 
to  Pi.  (The  output  switch  for  the  i-th  processor  is  denoted  by  Si, 3.)  Each  of  the  three  banks  of 
switches  could  be  controlled  by  a  separate  control  bus  of  maximum  width  [log  P]  bits,  denoted 
by  Ci,  Ci,  or  C3. 

Alternatively,  the  aforementioned  switches  could  be  controlled  by  a  single  bus  of  width  [log3P] 

bits.  Although  such  multiplexing  might  save  bus  space,  the  presence  of  demultiplexing  hardware 
is  implied,  which  could  increase  circuit  complexity,  power  consumption,  and  (possibly)  I/O  time. 


B  =  Data  Bus  C  —  Control  Bus 


Figure  B.l.  Schematic  diagram  of  common-bus  architecture  for  RC  based  processing  ofTNE 
ATR  operations,  where  Pi  denotes  the  i-th  processor,  Si,i ,  Si, 2  (Si, 3)  denote  the  input 
(output)  switches  specific  to  Pi,  MMU  denotes  a  memory  management  unit,  and  Mi 
denotes  Pi ’s  local  memory. 

B.l.2.2.  Features  and  Advantages:  The  common  bus  would  allow  a  partition  of  data  to  be 
broadcast  to  multiple  processors,  e.g.,  a  K-pixel  sampling  block  in  TNE,  or  template/parameters 
required  by  non- TNE  paradigms  such  as  Boolean  template  matching  or  MISD-parallel 
processing  of  multiple  ATR  filters  applied  to  a  given  input.  The  switches  would  allow  different 
processors  to  receive  specific  data  partitions  or  (in  the  case  of  TNE)  MMU  instructions  that 
could  facilitate  retrieval  of  AM  data  from  local  memory.  The  latter  operation  could  be 
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implemented  via  bypassing  the  FPGA  with  a  separate  control  bus  derived  from  B\  or  B2,  as 
shown  in  Figure  B.l. 

B.l.2.3.  TNE  Processing  Elements:  In  practice,  TNE  produces  in  each  column  of  the  agreement 
map  a  vector  of  bits,  each  of  which  indicate  whether  or  not  a  vector  component  matches  the 
corresponding  value  in  a  template  at  a  particular  pixel  position  and  greylevel.  The  summation  of 
the  j-th  column  of  the  AM  indicates  at  what  level  the  sampling  block  matches  the  j-th  template. 
This  can  be  thought  of  as  somewhat  similar  to  the  Hamming  distance  that  might  be  computed  by 
matching  a  source  block  to  the  j-th  template,  under  a  prespecified  thresholding  criterion.  Thus, 
reconfigurable  processing  elements  (PEs)  for  TNE-based  ATR  could  in  principle  be  relatively 
simple,  since  each  processor  could  initially  compute  the  sum  of  each  column  in  a  partition  of 
AM  columns  assigned  to  that  processor.  This  would  yield  a  measure  similar  to  the  Hamming 
distance  between  the  current  sampling  block  and  each  of  the  templates  that  correspond  to  the 
given  partition  of  die  AM  column. 

Given  the  Boolean  set  B  =  {0,1},  for  purposes  of  architectural  development  we  adopt  as  our 
example  application  computing  the  sum  of  each  column  aj  e  BK  of  an  agreement  map  denoted 
by  a  e  B  KxQ  .  The  Hamming- like  distance  that  corresponds  to  a*  can  thus  be  expressed  as: 

Ai(aj)=Xaj  ,  (B.3) 

where  the  overbar  denotes  complement.  Assuming  that  K  =  2",  the  equivalent  image  algebra 
expression 

implies  that  a  modulo- 2m  counter  with  inverted  input  could  be  employed  to  compute  Dn. 


- Vcc 

PRE I 


fa)  (b) 

Figure  B.2.  Schematic  diagram  of  (a)  J-K flip-flop  constructed  of  10  nand  gates, 
and  (b)  a  modulo-2m  counter  constructed from  m  J-K flip-flops. 
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B.l.2.4.  Example.  In  a  prototype  architecture,  it  is  possible  that  such  counter  circuitry  could  be 
contained  within  a  functional  block  of  each  processor  Pi.  Using  existing  tools,  it  would  be 
possible  to  specify  such  functionality  in  VHDL  code,  which  could  be  implemented  in 
reconfigurable  logic  with  nand  or  nor  gates,  for  example,  as  shown  schematically  in  the  mbit 
counter  of  Figure  B.2.  Here,  10m  (8m)  nand  {nand  +  nor)  gates  are  required  for  a  modulo-2m 
synchronous  counter.  The  input  would  be  inverted  to  implement  the  Boolean  complement 
operation  shown  in  the  preceding  equation. 

B.l.2.5.  Space  Complexity.  Assume  that  a  given  PE  can  be  configured  without  incurring 
performance  penalties  during  execution  due  to  I/O  pathlength  dictated  by  the  VHDUto-hardware 
place-and-route  algorithm.  If  Ng  nand  gates  are  required  per  counter,  then  one  could  additionally 
assume  an  overhead  £  as  a  fraction  of  total  gates,  for  I/O  handling  (buffering,  etc.),  thereby 
yielding  a  total  of  Ng(l+i£)  gates  per  modulo- 2m  counter. 

For  purposes  of  simplicity,  assume  that  each  configurable  logic  block  (CLB)  in  an  FPGA  has 
an  integral  number  of  counters  only,  due  to  limitations  on  I/O  into  or  out  of  each  CLB.  Letting 
each  FPGA  have  H  CLBs  with  gates  per  block,  the  maximum  number  of  counters  per 
FPGA  fully  populated  with  counter  circuits  would  be  estimated  as 


Solving  the  preceding  equation  for  Ng  yields  the  following  conservative  bound  on  the  number  of 
gates  per  counter: 


I  Nc  |  (1  +  f,)  ' 

where  the  ceiling  function  reflects  the  previous  assumption  pertaining  to  the  number  of  counters 
per  CLB. 

B.l.2.6.  Example.  If  fg  =  0.2,  K  =  256,  each  counter  requires  N  =  8  •  logK  gates,  and  an  FPGA 


is  partitioned  into  Nb  —  1024  (32x32-element  array  of)  functional  blocks  of  Nbg  =100  gates  each, 
then 


N  <1024-  - -  =1024  counters 

[64(1  + 0.2)  J 

could  be  implemented  per  FPGA,  neglecting  other  on-chip  requirements. 

B.l.2.7.  Observation.  Each  FPGA  functional  block  may  require  that  additional  gates  be 
allocated  to  nomcomputational  work  such  as  buffering  data  routed  to  and  from  local  memory. 
Until  the  counters  are  coded  in  VHDL,  and  a  detailed  schematic  is  developed,  it  is  not  possible  to 
know  how  many  additional  gates  would  be  required. 

B.1.3.  Preliminary  Analysis.  We  begin  with  an  efficiency  analysis,  then  concentrate  on  I/O  and 
computational  cost  analysis  specific  to  the  implementation  of  TNE/ATR  on  the  architecture  of 
Figure  B.l. 

B.1.3. 1.  High-Level  Efficiency  Analysis.  Our  Phase- 1  research  indicates  that  the  key  constraint 
on  TNE’s  implementational  success  is  the  efficient  retrieval  of  agreement  map  data  from  the 
local  memory  of  each  processor.  Proceeding  from  the  general  to  the  specific  in  this  analysis  of 
efficiency  and  resource  constraints  and  given  an  execution  time  per  block  in  the  proposed  RC- 
based  architecture  (operating  in  non-pipelined  mode)  of  AtB,  we  note  (from  Equation  B.l)  that 
the  total  execution  time  per  frame  is  given  by 
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At,0,  =  NFR-AtB  , 

where  the  source  frame  size  is  denoted  by  N,  frame  rate  by  F,  and  replication  factor  by  R. 

Letting  F,R  =  1  for  purposes  of  simplicity,  the  total  computational  bandwidth  can  be 
expressed  as 

BWtot  <  — —  .  (B.6) 

°‘  N?tB  v  ' 

We  next  consider  the  role  of  I/O  in  the  formulation  of  AtB . 

B.l.3.1.1.  I/O  Constraints.  Assume  that  the  L  columns  and  K2  rows  of  the  template  database 
are  partitioned  P  ways,  such  that  each  local  memory  M  contains  n  =  ["L/P]  columns  of  the 
agreement  map.  This  implies  that  the  size  of  M;  is  givenby 

|Mi|^K2  Tl/pI  bits.  (B.7) 

B.l.3.1.1.1.  Example.  If  K  =  256,  L  =  16K  bits,  and  P  <  16 ,  then  |Mj|  >  2562  •  IK  bits  =  8M 

bytes.  If  P  =  4  (a  more  reasonable  value),  then  |mJ  >  2562  -4Kbits  =  32M  bytes,  which  is 
achievable  with  current  technology. 

Assume  that  the  dominant  component  of  AtB  is  the  I/O  time  required  to  retrieve  AM  data  from 
local  memory.  (This  assertion  is  supported  by  the  analysis  of  Section  B.  1.3.2.)  Letting  At,L^L 

denote  the  memory  cycle  time  and  Nmp  the  number  of  memory  read  ports,  we  have  the  following 
I/O-dominant  bound  per  block: 

.  Kn^t,™ 

At  > - 252. 

Nm 

up 

which  can  be  inverted  to  obtain  block- specific  bandwidth.  Via  substitution  of  the  preceding 
equation  into  Equation  (B.6),  total  bandwidth  can  be  elaborated  as 

ryvyr  ^  1  _  Nmp 

(fN-NXfK-KXfL-n)?^  ’  C3'8) 

where  the  fractions  ft,  fc,  and  £  were  defined  previously.  Solving  for  n,  the  bound  on  the 
number  of  AM  columns  stored  in  local  memory  is  given  by 

< _ Nmp _ 

(fN-NXfK  KXfL-BWtot).?tmem  •  <®-9> 

B.l.3.1.1.2.  Example.  Letting  fc,  fk,  t  =  1  for  purposes  of  conservative  analysis,  and  assuming 
that  N  =  1M,  K  =  256,  BWtot  =  1  Hz,  <4,  and  Atmem=  10‘8  sec  (i.e.,  10  ns  memory  cycle 

time),  we  obtain  n  <  4/(220  •  28  -10"*)  =  1.49  bits,  which  is  prohibitively  small. 

In  contrast,  letting  ft  =  0.1  (10  percent  of  the  source  pixels  are  interrogated),  =  0.5  (half  of 
the  block  contents  are  sampled),  and  At^  =  8x1  O'9  sec  (8  ns  memory  cycle  time),  one  obtains  n 
—  37  bits,  which  is  insufficiently  large  to  support  feasible  parallelism. 

However,  if  L  =  16K  bits,  then  this  implies  that  the  degree  of  parallelism  (i.e.,  number  of 
processors)  P  =  16K/37  =  443  processors,  which  is  prohibitive  for  current  FPGA-based 
architectures.  To  obtain  P<16  at  F,R  =  1,  there  must  be  realized  a  speedup  of  443/16  =  27.6:1 
or  greater. 

B.I.3.1.1.3.  Observation:  A  useful  method  for  increasing  this  processing  efficiency  is  the 
hierarchical  partitioning  of  the  column  coordinate  of  the  agreement  map.  For  example,  if  £  = 
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0.1,  then  n  can  be  brought  to  within  a  factor  of  2.8  (approximately  1 1)  of  the  value  required  for 
16- fold  (four- fold)  parallelism.  Over  the  duration  of  the  proposed  Phase- II  project,  the 
remaining  factor  of  2.8  could  possibly  be  compensated  by  increases  in  Nmp  and  At^  as  memory 
technology  develops. 

B.l.3.1.1.4.  Remark.  In  practice,  a  reduction  in  £  could  be  achieved  through  a  two-  or  three- 
level  scheme,  in  which  similar  templates  are  grouped  together  via  Boolean  operations  on 
neighborhoods  of  each  row  in  the  agreement  map.  This  would  yield  a  high-level  assembly  of 
composite  templates,  where  each  template  would  likely  have  reduced  resolution  with  respect  to 
the  original  templates  from  which  it  was  derived. 

B.l.3.1.1.5.  Example.  If  a  target  is  imaged  at  10  degree  rotational  increments  (azimuth  and 
zenith),  then  grouping  of  adjacent  templates  in  30  degree  disjoint  but  adjacent  azimuth  and 
zenith  angle  ranges  could  yield  a  nine-fold  reduction  in  L  (i.e.,  £  =  0.1 1). 

We  next  consider  the  space  requirements  of  the  architecture  shown  in  Figure  B.l. 

B.l.3.1.2.  Space  Complexity.  There  are  several  issues  to  consider  when  analyzing  space 
complexity  of  the  proposed  architecture,  which  include  (1)  bus  size  and  switching,  (2)  processor 
size  (number  of  gates),  partitioning  scheme  (e.g.,  flat- field  versus  hierarchical),  and  bus 
structure,  as  well  as  (3)  local  memory  size.  We  overview  each  of  these  issues,  then  make 
recommendations  for  further  analysis. 

B.l.3.1.2.1.  Bus  Complexity.  As  shown  in  Figure  B.l,  three  data  and  control  buses  are 
sufficient  to  compute  binary  image  operations  on  an  RC  based  processor.  In  Figure  B.l,  Si  and 
S2  have  width  m  bits,  and  S?  has  width  2m  bits.  The  control  buses  Ci  through  C3  each  have 
width  [log  P]  bits,  for  a  total  bus  space  complexity  of 

S(m,  P)  =  4m  +  3  •  [log  P]  bits  . 

Note  also  that  buses  do  not  have  convoluted  shapes,  and  could  thus  be  straightforwardly 
routed  on  a  multi- layer  circuit  board. 

B.l.3.1.2.2.  Example.  If  source  imagery  has  eight  bits  per  pixel  and  16-fold  parallelism  is 
employed,  then  m  =  32,  P  =  16,  and  5"(m,P)  =  32  +  3(4)  =  44  bits,  which  is  not  prohibitive  in 
terms  of  current  board  fabrication  technology.  If  P  =  8  (P  =  4),  then  S(m,P)  =  41  bits  (38  bits), 
which  are  less  stringent  than  the  first  case. 

B.l.3.1.2.3.  Processor  Size.  A  key  concern  for  efficient  RC-based  processing  is  whether  or  not 
the  RC  processing  chip  set  has  sufficient  gates  to  implement  the  required  intra-processor 
parallelism.  In  Equation  (B.4)  we  expressed  the  number  of  counters  that  could  be  derived  from  a 
reconfigurable  logic  chip  of  Ng  gates.  As  stated,  we  expect  that  approximately  one  counter  per 
CLB  could  be  obtained.  This  reduces  the  issue  of  intra-processor  parallelism  to  determining  the 
number  of  available  CLBs.  Let  us  make  a  best-case  assumption,  namely,  that  all  CLBs  on-chip 
would  be  available  for  counter  implementation,  and  that  all  counters  could  be  run  simultaneously 
(e.g.,  sufficient  I/O  redirection  to  permit  efficient  transmission  of  each  AM  column  to  each  RC 
processor).  The  feasibility  of  this  assumption  is  further  explored  in  Section  B.3.3. 

We  have  stated  that  an  AM  partition  size  of  n  =  IK  columns  could  be  accomodated  in  the 
worst  case,  that  is,  in  the  absence  of  partitioning  of  the  AM  column  coordinate  (e.g.,  £  =  1). 
This  implies  degree  of  parallelism  of  IK  counters  (e.g.,  a  32x32-element  matrix  of  sufficiently 
large  CLBs  on-chip)  in  order  to  process  all  columns  concurrently.  Fortunately,  such  parallelism 
is  available  with  current  FPGA  technology.  The  case  where  £  <  1  would  obviously  present  less 
of  an  implementational  challenge. 

B.l.3.1.2.3.  RC  Processor  Partitioning.  The  majority  of  FPGAs  or  RC  chips  are  configured  as 
a  flat  array  of  gates,  with  high-speed  communication  buses.  The  regions  of  the  gate  array 
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delimited  by  the  buses  are  called  configurable  logic  blocks.  Each  CLB  has  essentially  the  same 
bus  access  priority  and  can  support  the  same  type  of  circuits  as  other  CLBs.  Cross-block 
mapping  is  not  necessarily  problematic,  except  for  exotic  memory  applications. 

In  contrast,  there  exist  several  hierarchically  partitioned  FPGA  architectures,  notably  the 
Actel  Eclipse,  with  three  levels  of  partitioning,  namely,  CLB,  functional  block  (FB),  and  gate.  In 
such  devices,  a  given  CLB  or  FB  can  be  programmed  to  have  different  bus  access  priorities,  data 
transfer  speed,  and  gate  interconnections,  independent  of  other  partitions.  As  a  result,  a  variety 
of  complex  circuits  are  possible,  for  example,  using  small  partitions  of  each  CLB  for  address 
incrementation,  larger  partitions  for  integer  adders,  and  one  or  more  CLBs  for  multipliers  or 
division  circuits. 

Challenges  involved  in  the  programming  of  FPGAs  include  external  I/O,  inter-block  and 
inter-chip  communication,  programming  hierarchical  FPGAs.  The  issue  of  external  I/O  buses 
has  been  discussed  previously  with  respect  to  the  problem  of  obtaining  AM  data  stored  in  local 
memory.  Communication  between  CLBs  and  FPGA  chips  is  supported  in  various  ways  by 
different  manufacturers  and  on  different  chips.  Since  FPGA  technology  appears  to  be  in  its  late 
infancy,  it  is  difficult  to  predict  what  capabilities  will  be  offered  in  future  devices.  However,  for 
the  architecture  of  Figure  B.l,  it  would  be  obvious  to  concentrate  on  high  inter-CLB 
communication  bandwidth  (i.e.,  fast  internal  buses  on-chip),  as  well  as  support  for  fast  external 
I/O.  For  example,  Xilinx  is  a  leader  in  this  field,  and  supports  segmented  routing,  which  takes 
longer  to  route  cross-chip  than  within-chip.  In  contrast,  Altera  offers  more  efficient  cross-chip 
routing,  but  penalizes  routing  within-chip. 

The  programming  of  flat- field  FPGAs  has  fewer  associated  challenges  than  programming 
hierarchically- structured  FPGAs,  due  to  more  degrees  of  freedom  in  the  latter  partitioning  model. 
For  example,  a  VHLD- to- hardware  compiler  for  hierarchical  FPGAs  must  determine  not  only 
the  arrangement  of  circuit  components  (e.g.,  links  between  gates  in  the  2-D  gate  array)  but  also 
must  perform  near-optimal  place-and-route  in  the  presence  of  a  third  dimension  (i.e.,  level  of 
paritioning).  As  a  result,  we  recommend  the  use  of  a  flat-field  Xilinx  FPGA  (e.g.,  Virtex  series) 
for  a  prototype  architecture  in  a  possible  Phase- II  effort. 

B.l.3.1.2.4.  Local  Memory  Size.  An  additional  implementational  issue  is  the  size  of  local 
memory  required  to  store  sufficient  information  for  the  agreement  map  partition  assigned  to  Pi . 
Equation  (B.7)  succinctly  describes  the  memory  size  |M;|  required  to  implement  retrieval  of  an 
n-column  partition  of  the  agreement  map.  Additional  data  to  be  stored  in  local  memory  might 
include  an  access  frequency  table  (e.g.,  a  2-D  histogram  having  domain  coordinates  that 
correspond  to  K  greylevels  and  K  pixel  positions).  This  table  would  record  the  cumulative 
frequency  of  access  for  a  given  source  pixel  intensity  at  a  given  position,  and  could  be  used  by 
an  intelligent  MMU  to  facilitate  more  efficient  memory  access  via  a  prefetch  buffer.  For 
example,  if  the  buffer  stored  the  256  most  frequent  accesses  to  AM  data,  then  the  additional 
space  requirement  in  M  would  equal  256n  bits.  For  n  =  IK,  (32Mbits  overhead)  this  is  not  a 
significant  additional  cost. 

B.l.3.2.  Functional  Block  Timing  Analysis.  In  order  to  understand  the  role  that  each  CLB 
plays  in  the  determination  of  complexity  and  cost  of  TNE  algorithm  implementation,  we  present 
the  following  algorithm  for  processing  one  column  of  the  agreement  map. 

B.l.3.2.1.  Algorithm.  Let  a  K2-row  by  n-column  partition  of  the  agreement  map  be  stored  in 
memory  Mi  of  processor  Pi,  where  i  =  1..P.  To  retrieve  the  AM  data  specific  to  a  given  sampling 
block,  then  process  the  data  and  send  results  back  to  the  host  unit,  the  following  execution 
sequence  is  recommended  for  the  architecture  of  Figure  B.l: 
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Step  1.  Submit  Block  Contents  to  Processor  -  Host  unit  closes  all  switches  on  Bi  and  sends 
retireval  codes  down  B2  to  the  MMU  for  M*.  Each  retireval  code  consists  of  a  transformation  of 
each  pixel  (x,a(x)),  where  x  is  a  domain  point  (pixel  coordinates  in  sampling  block)  and  a(x) 
denotes  the  pixel  value.  In  practice,  fK  -K  retrieval  codes  would  be  broadcast  to  all  P\. 

Step  2.  Retrieve  Salient  AM  Partition  -  Each  processor  could  pass  the  incoming  retrieval  codes 
to  the  MMU  from  Bi,  or  the  processor  could  be  circumvented  via  the  small  interconnecting  bus 
shown  in  Figure  B.l.  The  latter  alternative  is  faster,  but  involves  more  hardware.  The  MMU 
would  retrieve  the  fK  •  Kn  required  bits  from  Mi,  which  would  be  sent  to  P\. 

Step  3.  Compare  test  vector  from  sampling  block  with  agreement  map.  For  example,  compute 
the  Hamming  or  Euclidean  distance  between  the  test  vector  and  each  column  of  the  AM. 

Step  4.  Output  comparison  results  serially  from  PEs  P\  by  (a)  closing  $,3  and  ensuring  that  all 
other  switches  on  #3  are  open,  then  (b)  sending  the  results  from  Pi  down  P3,  (c)  opening  S^. 
B.l.3.2.2.  Analysis.  The  preceding  algorithm  can  be  analyzed  by  decomposing  each  step  into  its 
components,  to  which  is  assigned  a  time  delay.  The  following  notation  pertains: 


Table  B.l.  Symbols  for  timing  analysis  of  CLB  based  processing  of  a  sampling  block. 


Symbol 

At, 


BiSW 


^  BOM 

At 


MMPR 


At  Ml/o  (n  ) 

Atp©s 


Description 


Opening  or  closing  of  switches  on  bus  B[ 

Transmission  of  an  «-bit  result  along  bus  B\ 

Processing  of  one  MMU  retrieval  code 

Retrieve  n  bits  from  local  memory  Mj 

Compute  distance  between  test  vector  and  AM  column 


As  noted  previously,  AtPjDS  could  represent  the  time  required  to  compute  the  Hamming  or 

Euclidean  distance  between  the  test  vector  and  a  column  of  the  agreement  map.  Given  die 
preceding  notation,  an  expression  of  the  total  processing  time  for  one  sampling  block  being 
compared  with  the  AM  partition  retrieved  from  M  and  stored  on  Pi  is  given  by  the  following 
mathematical  model: 


^B2SW  “^“^B2XM 

[Step  1] 

+  K  *  (AtMMPR  +  AtMI/0  (n)) 

[Step  2] 

+  n  •  AtPfl>s 

[Step  3] 

(B.10) 

+  P  •  (2 At  B3SW  +  n  ■  At  B3XM )  . 

[Step  4] 

B.l.3.2.3.  Observation.  Possible  value  ranges  for  variables  in  the  preceding  equation  include 
those  listed  in  Table  B.2,  below.  To  determine  the  I/O  cost,  inspection  of  the  second  line  of 
Equation  (B.10)  shows  that  K  •  (AtMMPR  +  AtMI/0(n))  is  the  dominant  term,  since  the  memory  cost 
AtMI/0(«)  for  n  =  1  is  multiplied  by  Kn,  assuming  linearity  of  I/O  cost. 
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B.l.3.2.4.  Example.  Let  AtMI/0(«)  =  n(10  ns),  and  n  =  n  =  1024,  with  AtMMPR  =  10  ns.  In  this 
case,  the  second  line  of  Equation  (B.10)  would  have  the  dominant  term 

K(n  +  1)(10  ns)  =  2.624xl0‘3  sec 

per  source  pixel.  Note  that  this  is  not  a  function  of  the  type  of  processor  employed  (e.g.,  DSP, 
embedded  parallel  processor,  or  FPGA),  but  rather  an  inherent  problem  in  the  movement  of  large 
amounts  of  data  across  local  memory  buses.  In  contrast,  if  n- fold  parallelism  is  available  on  each 
processor  (e.g.,  n  counters  running  concurrently  to  compute  Ai  applied  to  each  AM  partition’s 
column),  then  the  computational  cost  represented  by  line  3  of  Equation  (B.10)  would  be  given  by 

1.02  x  10‘4  sec  <  n  •  AtPjDS  <  2.05  x  10'5  sec  , 

which  is  1/128  to  1/26  of  the  I/O  cost  estimated  above.  Thus,  in  a  realistic  model  of  computation 
the  I/O  cost  predominates,  which  validates  the  discussion  of  Section  B.  1 .2. 

Table  B.2.  Example  value  ranges  for  analysis  of  CLB  based  TNE  processing. 

Conservative  Estimates  of  Model  Parameter  Values 
1  ns  to  10  ns 

1  ns/bit  to  10  ns/bit  (100  Mbps  to  1  Gbps) 

5  ns  to  15  ns  (fast  cache  code- to- address  translation) 

10  ns  to  50  ns  per  access  (primary  memory  assumed) 

20  ps  to  100  ps  (10MHz  to  50MHz  counter  BW) 

64  to  1024  pixels  per  sampling  block 
~lKbits,  to  be  computed  from  Equation  (B.9) 

2  to  16  processors 

B.l.3.3.  Processor  Timing  Analysis.  In  any  realistic  analysis  of  computational  cost,  processor 
timing,  and  system  performance,  one  must  estimate  the  computational  delay  incurred  by  the 
processor  itself.  In  the  preceding  model,  this  delay  is  represented  by  the  variable  AtPjDS. 

Assume  that  the  counter-based  model  of  Z)H  computation  forms  fie  basis  for  our  original 
formulation  of  the  architectural  design  shown  in  Figure  B.l.  Further  assume  that  there  are  K 
pipelines  per  processor  P\,  each  of  which  implement  the  previously-discussed  counter  process. 
We  initially  make  this  assumption  to  show  that  computation  of  TNE-based  pattern  matching 
could  be  with  K-fold  parallelism.  The  following  three  observation-example  pairs  show  that  this 
assumption  is  untenable  in  practice  for  the  worst  case,  where  the  sampling  block  and  AM  are 
fully  processed.  However,  as  shown  in  Sections  B.l.3.3.7-8,  it  is  possible,  with  partial  or  full 
pipelining  of  AM  data  through  the  counter,  to  process  n Ip  columns  of  the  AM  at  K  pixel  shifts  of 
each  input  sampling  block.  This  holds  especially  well  for  fk,lL  <  1. 

B.l.3.3.1.  Observation.  We  begin  this  analysis  of  processor  complexity  by  noting  that,  if  the 
topologically  longest  path  through  the  m-bit  counter  consists  of  Npg  gates,  then  the  delay  in  non- 
pipelined  processing  of  each  AM  column  would  be  computed  as 

A*  p/ds  =  K  ‘  Npg  •  At  clock ,  (B.  1 1) 

where  the  variable  Atclock  denotes  one  clock  cycle  delay  in  Pi. 

B.l.3.3.2.  Example.  If  an  FPGA  is  employed  with  100MHz  clock,  then  Atclock  =  10  ns,  which 
implies  that,  for  a  counter  with  longest  path  of  32  gates  (Npg  =  32)  and  a  K  =  256  element 
sampling  block,  then  AtPjDS  =256(32)10  ns  *  8.2xl0'5  sec,  or  82  ps.  This  implies  that  the 
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temporal  delay  component  of  Equation  (B.10),  given  by  n-AtPiDS,  equals  approximately  84ms 
for  a  1024-column  AM  partition  computed  on  P\. 

For  a  frame  rate  of  F  =  lfps,  this  would  mean  that,  given  worst-case  constraints  ft  =  1  and  £ 
=  1  with  image  resolution  of  N  =  10242  pixels,  then  ft  =  1.14  x  10'5,  or  0.001 14  percent  of  source 
pixels  are  interrogated.  This  statement  can  be  verified  by  observing  that 

fN  — _ I _ 

1024 2  pixels/fra  me  •  84  msec/pixel  •  1  frame/sec  ’ 

Note  that,  if  ft  =  0.5  and  ft  =  0.1,  as  discussed  previously,  ft  would  still  equal  2.28  xlO-4 ,  which 
means  that  only  240  source  pixels  would  be  interrogated  by  TNE/ATR  in  the  preceding  case, 
which  we  implicitly  assume  (for  purposes  of  this  analysis)  to  be  the  best  case.  However,  this 
assessment  may  vary  in  practice. 

B.l.3.3.3.  Observation.  The  preceding  values  for  ft  are  both  infeasibly  low,  and  derive  from  a 
value  of  AtPlDS  that  has  been  set  artificially  high  due  to  the  unrealistic  assumption  of  non- 

pipelined  execution.  Since  we  are  assuming  that  the  FPGA  or  RC  processor  Pi  is  a  clocked 
(synchronous)  circuit,  it  is  not  unreasonable  to  assume  that,  in  the  absence  of  set-up  time,  anNpg- 
gate  processor  path  could  be  cleared  in  bjjg  clock  cycles,  filled  in  h[,g  cycles  with  partially 
processed  results,  and  require  K  cycles  for  execution  of  the  counter  process,  thereby  yielding  a 
total  time  delay  that  can  be  expressed  in  terms  of  partially  pipelined  execution  in  terms  of  an 
execution  time 

AtP©s  =(K  +  2Npg)-Atcl0Ck  .  (B.12) 

This  represents  a  speedup  of 


?  K-Npg-nc^  K’Npg  k 

n’L  k+2N  J-?U,  "  K+2N„  "  K/N„g  +  2  ~  » 


(B.13) 


where  the  approximation  is  valid  for  Npg  »  2. 

B.l.3.3.4.  Example.  If  K  =  256,  Npg  =  32,  and  Atclock  =  10  ns,  then 

AtZs  =  (K  +  2Nps)  •  Atc.odc  =  (64  +  256)10  ns  =  3.2  p is  , 

which  implies  that  n  •  AtPiDS  ~  3.3  ms  for  a  1024-column  AM  partition  computed  on  P{. 

Following  Example  B.l.3.3.2,  given  ft, ft  =  1  and  N  =  10242  pixels,  this  implies 
fN  =  2.89  xlO"4  or  0.029  percent  of  source  pixels  are  interrogated.  If  ft  =  0.5  and  ft  =  0.1,  then 

fN  *  5.78  xl(T3,  or  0.578  percent  of  source  pixels  are  interrogated  by  the  TNE  ATR  algorithm. 
In  a  10242 -pixel  image,  this  equates  to  6,061  candidate  target  pixels. 

B.l.3.3.5.  Observation.  The  preceding  situation  can  be  further  improved  by  not  clearing  the 
pipeline  at  the  arrival  of  each  AM  column  corresponding  to  an  input  pixel,  except  to  remove 
possible  spurious  bits  before  the  first  sequence  of  K  bits  (the  first  column  of  each  agreement  map 
partition)  is  processed.  This  implies  that  full  pipelined  processing  of  a  K-row  by  N-column  AM 
partition  would  require  time 

=  n  -AtP,DS  =  (Kn  +  2Npg)  •  Atclock  cycles ,  (B.14) 

which  represents  a  savings  of  (n  -  2)  •  Npg  cycles,  but  a  speedup  that  remains  as 

9  _  n '  1  £ PIPS  _  ^Pg  '  ^  ^ clock  _  '  ^pg  _  Kn 

■f  Ti  (Kn  +  2Npg).?tclock  '  Kn  +  2Npg  “  KiyNpg  +2  “  P£  (  ’  } 


(B.14) 


(B.15) 
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as  in  Observation  B.l. 3.3.3.  However,  the  reliability  of  this  circuit  could  be  compromised  by 
lack  of  clearing  the  counter  prior  to  processing  of  each  column  corresponding  to  an  input  pixel. 
B.l.3.3.6.  Example.  Selecting  again  the  values  K  =  256,  Npg  =  32,  and  Atclock  =  10  ns,  we  obtain 
the  following  result  for  fc,fL  =  1 : 

T*  =  (Kn  +  2Npg)  •  At  clock  =  (256(1024)  +  64)10  ns  =2.63  ms, 
which  represents  a  25.5  percent  speed  increase  over  the  value  of  n  •  Atpp  =  3.3  ms  computed  in 

Pi'DS  r 

Example  B.l. 3. 3.4. 

Similar  to  Exampled  B.l.3.3.2  and  B.l.3.3.4,  this  implies  that  fN  « 3.62 xlO-4 or  0.036 
percent  of  source  pixels  are  interro^ted.  If  £  =  0.5  and  £  =  0.1,  then  fN  ~  7.2  xlO-3,  or  0.72 
percent  of  source  pixels  are  interrogated.  In  a  10242-pixel  image,  this  means  that  7,549  pixels 
would  be  interrogated  as  candidate  targets.  This  equates  to  approximately  one  target  of  size 
128x64  pixels,  seven  targets  of  size  32x32  pixels,  or  29  16xl6-pixel  targets,  which  are  realistic 
expectations  for  small-target  imagery. 

B.l.3.3.7.  Observation.  The  numbers  of  targets  discussed  at  the  conclusion  of  the  preceding 
example  are  sufficient,  but  are  barely  adequate  for  fields  that  contain  a  large  number  of  targets 
(e.g.,  terrestrial  surfaces  containing  broadcasted  anti-personnel  mines).  The  reason  for  this  small 
number  of  targets  (and  large  value  of  the  product  n  -AtPjDS)  is  that  we  have  assumed  (at  the 
outset  of  this  subsection)  that  there  are  but  K  counter  pipelines  available  per  processor  (e.g.,  the 
case  of  a  fast  serial  DSP).  If  an  mbit  counter  requires  fewer  than  6m  gates  (as  in  the  case 
discussed  in  Section  B.  1.2.4),  and  incurs  a  longest-path  delay  of  <  3m  gates,  then  this 
implies  that,  for  m  =  8,  fewer  than  50  gates  per  counter  would  be  utilized,  for  a  total  of  fewer 
than  12,800  gates  if  K  =  256.  By  inspection  of  Equation  (B.4),  it  is  possible  to  implement  many 
more  than  K  such  counters  in  a  100,000- gate  FPGA.  Hence,  it  is  reasonable  to  consider  the 
presence  of  p  »  K  pipelines  on-chip. 

In  order  to  address  the  case  of  parallel  (and,  possibly,  reconfigurable)  processors  P\  we 
modify  that  assumption  to  assume  the  mplementation  of  Hp  »  K  pipelines  per  processor, 
which  describes  the  case  of  fine-grained  parallelism  (as  opposed  to  the  term  P  used  to  designate 
degree  of  coarse-granular  parallelism  expressed  as  the  number  of  processors).  In  this  case,  the 
time  estimates  in  Equations  (B.ll,  B.12,  B.14)  can  be  scaled  quasilinearly  with  parallelism  per 
pixel  shift  p  =  Ncp/K  pipelines  available  per  pixel  shift,  to  provide  a  cost  estimate  for  processing 
K  pixel  shifts  of  a  given  sampling  block  as 

(B.ll') 

nACs=  7  (K+2N,g).AttM  (B.12') 

Tl  =(K{^j  +  2N„,).At,M,  (B.14') 

The  following  example  calculates  execution  time  for  the  preceding  expressions. 

B.l.3.3.8.  Example.  Let  Nq,  =  2048  pipelines  per  processor,  and  assume  (for  purposes  of 
simplicity)  that  the  requisite  I/O  bandwidth  is  available  to  circumvent  on-chip  I/O  bottlenecks. 
As  before,  let  K  =  256,  n  =  1024,  Npg  =  32,  and  AtcIock  =  10  ns,  which  implies  that 
p  =  Ncp  /  K  =  2048/356  =  8 .  The  following  results  pertain: 


84 


STTR  Army02-T004  Topic  Title:  Analysis  and  Characterization  of  Pattern  Classifiers; 

GASP  •  Generator  for  Adaptive  Statistical  Pattern  Recognition  Systems  January  31 ,  2003 

n  ■  Atp<DS  =  1024  •—  •  32- 10  ns  =  10.5  ms 
8 

n  •  Atpp  =-^-  (256  +  64)- 10  ns  =0.409  ms 

PiDS  g  v  ' 

T*  =  (256 -^^  +  64) -10  ns  =0.328  ms. 

“  8 

In  the  latter  case,  the  speedup  equals  2.63ms/0.328ms  =  8,  which  is  exactly  p.  Hence,  it  is 
possible  to  remedy  the  problems  of  low  speed  without  resorting  to  source  block  subsampling 
(i.e.,  setting  ft  <  1).  The  resulting  speedup  due  to  hierarchical  partitioning  of  the  AM  column 
coordinate,  which  we  have  shown  could  approximate  fL  =  0.1,  would  further  increase  the 
efficiency  of  processing. 

B.l.3.3.9.  Remark.  In  order  to  further  increase  system  throughput  in  terms  of  number  of  source 
pixels  interrogated,  we  foresee  the  employment  of  three  potential  technologies,  which  are: 

1.  Faster  FPGA  or  RC  clock  speed  -  For  example,  Xilinx  is  developing  a  5  million  gate  FPGA 
with  estimated  clock  speed  of  1  GHz.  If  current  technology  trends  in  FPGA  development  mirror 
the  speed  increases  realized  thus  far  in  SISD  CPU  development,  then  one  can  expect  FPGA 
clock  rates  in  excess  of  3  GHz  within  the  next  three  years.  In  that  case,  the  values  of  ft  (and, 
concomitantly,  the  values  of  n  -AtPl.DS)  would  be  increased  (decreased)  by  a  factor  of  over  1.5 
orders  of  magnitude  in  comparison  with  values  derived  in  Example  B.  1.3 .3. 6.  This  would 
potentially  improve  target  acquisition  capabilities  under  the  preceding  assumptions  to  over  2,000 
16x1 6-pixel  targets,  or  more  than  170  32x32-pixel  targets.  Alternatively,  ft  could  be  increased 
for  greater  statistical  coverage,  and  different  partitioning  schemes  that  might  employ  higher 
resolution  in  the  template  domain  (e.g.,  ft  more  closely  approaching  unity)  could  be  employed. 

2.  Higher  degrees  of  on-chip  parallelism  -  Given  an  increasing  number  of  gates  in  FPGA 
designs  (e.g.,  20  million-gate  FPGAs  are  expected  in  commercial  quantity  within  five  years), 
more  pipelines  could  be  replicated  on-chip,  thus  facilitating  increased  parallelism  in  AM  column 
input.  This  would  decrease  the  degree  of  interleaving  or  replication  of  the  delay  n  •  AtKDS 

required  for  ATR  under  the  prespecified  constraints  of  lfps  and  lKxlK-pixel  imagery,  making 
n-fold  parallelism  on-chip  a  reality,  subject  to  the  constraints  of  the  von  Neumann  bottleneck 
between  Pi  and  M;. 

3.  Increased  I/O  parallelism  through  optical  backplane  interconnects  -  Although  not  a  likely 
development  in  FPGA  technology  in  the  proposed  Phase-II  project  period,  it  is  reasonable  to 
assume  that  current  experiments  that  attempt  to  increase  broadcast  scope  and  speed  in  closely- 
coupled  parallel  processors  could  be  eventually  migrated  to  FPGAs,  but  on  a  very  small  scale. 
Chief  among  the  technologies  for  large-scale  instruction  and  data  broadcast  in  electro- optical 
backplanes  is  multi-way  communication  facilitated  by  such  innovations  as  wavelength-division 
multiplexing,  spread-spectrum  encoding,  and  highly  miniaturized  array  detectors. 

B.1.4.  Algorithmic  and  Implementations!  Issues.  In  addition  to  the  cursory  analysis  presented 
in  the  preceding  sections,  we  discuss  two  technical  issues  of  interest  to  the  sponsor  (FTI). 
Initially,  we  present  a  brief  comparison  of  computational  cost  for  different  pattern  matching 
approaches  that  appear  to  be  significant  to  the  proposed  Phase-II  effort. 

The  Hamming- like  method  of  pattern  matching  discussed  in  Section  B.  1.2  is  neither  the  only 
nor  the  optimal  method  for  matching  sampling  block  contents  to  a  template  database.  For 
example,  the  Euclidean  distance  between  two  Boolean  feature  vectors  can  be  computed  from  Ai- 
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It  is  interesting  to  note  that,  in  the  case  of  comparing  Boolean  vectors,  the  Hamming  and 
Euclidean  distances  can  both  be  computed  via  the  XOR  operation. 

For  example,  given  two  Boolean  vectors  a  =  (a,,a2,...,aK)  and  b  =  (bj,b2,...,bK),  the 
Hamming  distance  can  be  expressed  as: 

=  I(a  xor  b) , 

while  the  Euclidean  distance 

db = ^X(ai  -bi  f  =  -/£(a  xor  b) 

can  be  computed  from  the  Hamming  distance  external  to  Pi  (e.g.,  in  the  FPGA  host).  A  similar 
comment  holds  for  the  RMS  distance 

£>RMs=-j^-X(ai“bi)2  =^-Z(axoRb)  =VAi/K  • 

As  a  result,  the  summation  of  XOR  results  constituent  to  the  Euclidean  and  Hamming 
distances  would  require  the  same  computational  cost  in  the  architecture  of  Figure  B.l,  regardless 
of  which  of  these  two  distances  was  being  computed. 

It  is  important  to  note  that  the  preceding  concept  is  relevant  for  Boolean  vectors  only.  For 
real- valued  vectors  a  and  b,  the  Hamming  distance  is  undefined  unless  a  thresholding  operation 
or  other  type  of  comparison  operation  is  introduced,  which  would  yield  a  Boolean  result. 

Additionally,  we  note  that  matching  of  real- valued  vectors  on-chip  would  require  integer  or 
floating-point  addition  and  multiplication  at  the  very  least,  both  of  which  are  space-consumptive 
and  tend  to  result  in  unfeasibly  slow  circuits  for  current  FPGA  implementations.  For  example, 
given  a  pattern  class  p  and  its  covariance  matrix  C,  together  with  a  mean-pattern  m  comprised  of 
K  elements,  the  Mahalanobis  distance  between  a  K-element  test  vector  a  and  m  would  be 
computed  as 

Du  =(a-m)'C(a-m)  , 

which  implies  multiplication  of  a  lxK-element  vector  with  a  KxK-element  matrix  to  yield  a 
lxK-element  vector  that  is  then  multiplied  by  a  Kxl -element  vector  to  yield  a  scalar.  This 
requires  0(K2)  multiplications,  which  should  be  floating  point  operations  if  p  is  large  (to  achieve 
sufficient  discrimination  between  patterns  represented  in  C  and  m. 

In  a  Phase-II  study,  we  propose  to  model  FPGA-based  architecture  performance  in  terms  of 
I/O  bus  bandwidth  and  collision  effects  associated  with  realistic  packet  size  (e.g.,  several 
columns  of  a  TNE  agreement  map  per  packet).  Additionally,  we  propose  to  model  the  effect  of 
packet  transmission  errors  and  retransmissions  on  architecture  performance,  as  well  as  MNN- 
directed  TNE  classification  accuracy. 

In  the  former  case  (performance),  bus  collisions  can  significantly  reduce  external  I/O  bus 
bandwidth,  creating  an  I/O  bottleneck  that  effectively  reduces  CPU  utilization,  thereby 
penalizing  processor  performance.  In  the  latter  case  (classification  accuracy),  if  information 
from  TNE’s  binary  pointer  table  contained  in  the  agreement  map  is  corrupted  when  transmitted 
over  external  or  internal  databus(es),  then  the  AM  would  no  longer  contain  correct  partial 
classification  results.  Whether  or  not  MNNs  have  sufficient  noise  tolerance  to  maintain 
classification  accuracy  (and,  therefore,  Pd  and  Rfa)  in  an  MNN-directed  TNE  paradigm  with 
significant  bus  errors  is  unknown.  Additional  research  is  therefore  indicated  in  performance 
simulation  for  TNE  architectures,  as  well  as  the  incorporation  of  fault  models  and  fault  isolation 
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procedures  in  reconfigurable  processors  with  different  types  of  external  buses  (e.g.,  PCI-bus 
versus  Myrinet). 


B.2.  ATR  Performance  Metrics  for  MNN-Directed  TNE 

In  previous  research,  the  authors  have  employed  several  measures  of  Pd  and  Rfa,  which  are 
(a)  computed  componentwise,  (b)  area-based,  or  (c)  pixel-based.  The  componentwise  Pd  is 
computed  as  follows: 

Step  1.  Apply  an  ATR  filter  such  as  die  ADGF  to  a  test  image,  then  threshold  or  otherwise 
filter  the  result  to  obtain  pixels  deemed  significant,  which  comprise  an  image  b.  Morphological 
filtering  can  be  used  in  this  step,  to  reduce  the  number  of  noise  components  whose  size  is  well 
outside  the  range  of  expected  target  size. 

Step  2.  Sequentially  label  the  connected  components  in  b  to  produce  an  image  c  e  Zmx 
(not  to  be  confused  with  a  compressed  image  c  discussed  in  Section  2).  Given  a  truthing  image  f 
e  {0,1  }x,  which  denotes  target  position  and  extent  by  unitary  values,  compute  the  fraction  of 
pixels  that  are  correct  detections  as 


(B.16) 


and  the  fraction  of  pixels  on  target  in  f  that  are  false  alarms  in  b,  as  follows: 


(B.17) 


where  f  denotes  the  complement  of  f. 

Step  3.  Assume  that  we  have  a  function  fc  :  {0,1  }x  ->  N  that  counts  the  number  of 
connected  components  in  a  Boolean  image  on  X.  For  example,  in  Step  2,  n  =/c(b).  Compute 
the  probability  of  detection  Pd  from  the  number  of  components  Nt  =  /c(f)  and  the  number  of 
components  Na  that  have  common  pixels  in  b  and  f,  as 

Pd  =  wcpc(l,Nd/Nt)  • 

The  upper  limit  of  Pd  is  clamped  to  the  unitary  value,  since  Pd  is  a  probability  and  there  can  be 
more  than  one  target  component  in  b  per  target  component  in  f.  One  similarly  computes  the 
number  of  false  alarms  as  the  number  of  components  Nf  that  have  one  or  more  pixels  located  in  b 
but  have  no  pixels  at  the  corresponding  locations)  in  f.  We  can  express  this  in  terms  of  image 
algebra,  as  follows: 

Nf=/C(bf). 

Observe  that  the  number  of  detected  components  (e.g.,  N,  Nf)  will  depend  on  the  detection 
technique,  in  particular,  upon  the  postprocessing  operations  that  are  applied  to  the  output  of  the 
ADGF.  In  practice,  we  apply  morphological  filters  to  the  thresholded  ADGF  output,  to  reduce 
the  incidence  of  spurious  noise. 

One  of  the  problems  associated  with  componentwise  Pd  and  Rfa  metrics  is  computational 
cost  -  given  an  N-pixel  image,  connected  component  labeling  can  require  as  many  as  0(N2) 
operations  for  serpentine  components.  As  a  result,  we  prefer  to  approximate  Pd  and  Nf  by  Fd  and 
F fa,  which  can  be  computed  on  parallel  processors,  as  shown  in  the  vector-parallel  formulation  of 
Equation  (B.16)  and  (B.17).  In  preliminary  tests,  we  have  found  that  Fd  and  Ffa  approximate  Pd 
and  Nf  (in  the  latter  case,  with  some  manipulation)  to  within  10  to  15  percent  error  in  imagery  of 
natural  scenes  with  sparse  convex  targets. 
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In  a  Phase-II  study,  we  propose  to  analyze  ROC  curves  constructed  from  Fa  and  Ffa,  to 
determine  their  correspondence  with  ROC  curves  constructed  from  Pa  and  Rfa  (derived  from  Nf). 
We  also  plan  to  show  that  Fa  and  Ffa  are  much  more  feasible  computationally  for  real-time 
detection  of  performance  deficits  in  support  of  dynamically  adaptive  MNN-directed  TNE. 
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